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(see Rule 70. 16 and Section 607 of the Administrative Instructions under the PCT). 



These annexes consist of a total 
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Basis of the report f < 
Priority 
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Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial applicability; 
citations and explanations supporting such statement 
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□ 
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Internationa] application No. 
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I. Basis of the report 



1 . With regard to die elements of the international application:* 

fx] the international application as originally filed 

Q^j the description: 

pages 

pages NONE 

pages NONE 



, as originally filed 



filed with the demand 



filed with the letter of 



fx] the claims: 

pages 

pages 

pages 

pages 



47-61 



NONE 



NONE 



. as originally filed 

. as amended (together with any statement) under Article 19 
. . filed with the demand 



NONE 



, filed with the letter of 



|~x| the drawings: 

pages i" 16 

pages 



NONE 



as originally filed 



. filed with the demand 



pages 



NONE 



. filed with the letter of . 



fx] the sequence listing part of 

^agesip tion: NONE 

pages NONE 

pages NONE 



the 



. as originally filed 

, filed with the demand 



filed with the letter of 



2. With regard to the language, all the elements marked above were available or furnished to this Authority in the language in wliich 
the international application was filed, unless otherwise indicated under this item. 

These elements were available or furnished to this Authority in the following language which is: 

I I the language of a translation furnished for the purposes of international search (under Rule 23.1(b)). 
I I the language of publication of the international application (under Rule 48.3(b)). 

| | the language of the translation furnished for the purposes of international preliminary examination (under Rules 55.2 and/ 
or 55.3). 

3. With regard to any nucleotide and/or amino acid sequence disclosed in the international application, the international 

contained in the international application in printed form. 
| | filed together with the international application in computer readable form. 
| | furnished subsequently to this Authority in written form. 
| | furnished subsequently to this Authority in computer readable form. 

□ The statement that the subsequently furnished written sequence listing does not go beyond the disclosure in the 
international application as filed has been furnished. 

I I The statement that the information recorded in computer readable form is identical to the writen sequence listing has 
1 — 1 been furnished. ^ to 

4 fx] The amendments have resulted in the cancellation of: 

EE] the description, pngps NONE 

the claims, Nos NONE 

fxl the drawings, sheets/fig none 

5- [x] This report has been drawn as if (some of) the amendments had not been made, since they have been considered to go 

beyond the disclosure as filed, as indicated in the Supplemental Box (Rule 70.2(c)).*' 
* Replacement sheets which have been furnished to the receiving Office in response to an invitation under Article 14 are referred to 
in this report as "originally filed" and are not annexed to this report since they do not contain amendments (Rules 70.16 
and 70.17). 

**Any replacement sheet containing such amendments must be referred to under item 1 and annexed to this report. 
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V. Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial applicability; 
citations and explanations supporting such statement 

I . statement 



Novelty (N) 



Inventive Step (IS) 



Claims 1-75 



Claims NONE 



Claims 11-24, 35-48, and 57-75 



Claims 1-10, 25-34, and 49-56 



YES 
NO 

YES 
NO 



Industrial Applicability (IA) 



Claims 1-75 
Claims NONE 



YES 
NO 



2. citations and explanations (Rule 70.7) 

Claims 1-10, 25-34, and 49-56 lacks an inventive step under PCT Article 33(3) as being obvious over US 5,742,283 (KIM). 



As to claims 1, 25, 49, and 50, Kim teaches the invention substantially as claimed, comprising: "at least one video input 
interface for receiving said video information" (col. 1, lines 19-30), "processing said video information by performing video 
object extraction processing to generate video object descriptions from said video information" (col. 1, lines 38-63), "processing 
said generated video object descriptions by object hierarchy construction and extraction processing to generate video object 
hierarchy descriptions" (col. 2, lines 66-67 and lines 3, lines 1-12), processing said generated video object descriptions by entity 
relation graph generation processing to generate entity relation graph descriptions wherein at least one description record 
including said video object descriptions (col. 13, lines 23-40), "said video object hierarchy descriptions and said entity relation 
graph descriptions is generated to represent content embedded within said video information" (col. 13, lines 31-65), and a data 
storage system, operatively coupled to said processor, for storing said at least one description record" (col. 5, lines 1-15). Kim 
did not explicitly teach, "a computer processor coupled to at least one video input interface for receiving the video information," 
however, it would have been obvious to one of ordinary skill in the art at the time the invention was made to use a computer 
processor coupled at least one video input interface for receiving the video information to perform these steps and in view of 
Kim's teachings (as taught in the background of the invention section, columns 1-2). 

With respect to claims 2 and 26, Kim teaches the invention substantially as claimed, "video extraction processing and said object 
hierarchy construction and extraction processing are performed in parallel" (col. 5, lines 22-48). 

With respect to claims 3 and 27, Kim teaches, "the video object extraction processing comprises: video segmentation processing 
to segment each video in said video information into regions within (Continued on Supplemental Sheet.) 
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CLASSIFICATION: 


\ 



The International Patent Classification (IPC) and/or the National classification are as listed below: 
IPC(7): G06F 17/30 and US CI.: 707/1, 103, 204; 345/327, 418, 430, 431, 439, 475; 348/154, 155, 429. 



I. BASIS OF REPORT: 

5. (Some) amendments are considered to go beyond the disclosure as filed: 
NONE 



V. 2. REASONED STATEMENTS - CITATIONS AND EXPLANATIONS (Continued): 

said video" (col. 5, lines 16-21), "feature extraction and annotation processing to generate one or more feature descriptions for 
one or more said regions" (col. 1, lines 53-63 and col. 48-65), and "said generated video object descriptions comprise said one 
or more feature descriptions for one or more said regions" (col. 5, lines 66-67 and col. 6, lines 1-16). Kim did not teach 
annotation processing, however, it would have been obvious to one having ordinary skill in the art at the time the invention 
was made to have annotation processing in performing step 1, because annotation is a way of adding a comment or an 
explanation to the descriptions of the regions of the video information. 

With respect to claims 4 and 28, Kim teaches the invention substantially as claimed, "said regions are selected from the group 
consisting of local, segment, and global regions" (col. 5, lines 2-7). 

With respect to claims 5, 29, and 51,Kim teaches the invention substantially as claimed, "said one or more feature descriptions 
are selected from the group consisting of media features, visual features, temporal features, and semantic features" (col. 5, 
lines 10-15, lines 22-28, and lines 48-51). 

With respect to claims 6, 30, and 52, Kim taught, "said semantic features are further defined by at least one feature 
description selected from the group consisting of who, what object, what action, where, when, why, and text annotation" (col. 
4, lines 12-37). Kim did not explicitly teach the group consisting of who, what object, where, when , why, and text 
annotation, however, it would have been obvious to one having ordinary skill in the art at the time the invention was made to 
have semantic features with a description selected from a group because the playing of the document can be altered by the 
definition and occurrence of the who, what object, where, when, why, and text annotation. 

With respect to claims 7, 31, and 53, Kim did not explicitly teach, "the visual features are further defined by at least one 
feature description selected from the group consisting of color , texture, position, size, shape, motion, camera motion, editing 
effect, and orientation, but it would have been obvious to one having ordinary skill in the art at the time the invention was 
made to have visual features defined by color, texture, position, size, shape, motion, camera motion, editing effect, and 
orientation because the spacial characteristics include the shape size, color, texture, position, motion, camera motion, editing 
effect, and orientation of the episode on display (see col. 4, lines 48-67). 

With respect to claims 8, 32, and 54, Kim does not teach, "media features are further defined by at least one feature 
description selected from the group consisting of media features of file format, file size, color representation, resolution, data 
file location, author, creation, scalable layer, and modality transcoding but it would have been obvious to one having ordinary 
skill in the art at the time the invention was made to have media features consisting of file format, file size, color 
representation, resolution, data file location, author, creation, scalable layer, and modality transcoding because the episodes of 
the temporal layout are organized into groups called temporal cliques and independently conform to their own spatial 
organization of media features. 

With respect to claims 9, 33, and 55, Kim teaches, "the temporal features are further defined by at least one feature 
description selected from the group consisting of start time, end time, and duration" (col. 6, lines 6-16). 

With respect to claims 10, 34, 56, Kim teaches, "said object hierarchy construction and extraction processing generates video 
object hierarchy descriptions of said video object descriptions based on visual feature relationships of video objects represented 
by said video object descriptions" (column 12, lines 51-64). 



Claims 1 1-24, 35-48, and 57-75 meets the criteria set out in PCT Article 33(2)-(4), because the prior art does not teach or 
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fairly suggest: As to claims 11-24, 35-48, and 57-75, the object hierarchy construction and extraction process generating video 
hierarchy descriptions of the video object descriptions based on semantic feature relationships of the video objects represented 
by the video object descriptions and based on the media feature relationships, the hierarchical levels comprising clustering 
hierarchies and multiple levels of abstraction hierarchies, taken together with the other claim limitations were not disclosed by 
the prior art of record. 

Claims 1-75 meets the criteria set out in PCT Article 33(4), because the techniques used for describing multimedia information 
and video information and the content of such information can be used to allow consumers and businesses to search for textual 
information on the World Wide Web. 



Applicants' arguments have been considered but are not persuasive in view of the original grounds of rejection. Examiner 
considers the Applicants' claim limitations of "the generation of video object descriptions by performing video object 
extraction processing to generate video object descriptions" or "the generation of video object hierarchies by object hierarchy 
construction and extraction processing to generate video object hierarchy descriptions" as not being found in claims 1, 25, and 
49. The claim limitations for claim 1, recites "processing said video information by performing object extraction processing to 
generate video object descriptions by object hierarchy construction and extraction processing to generate video object hierarchy 
descriptions ..." Claim 25 recites "processing said video information by performing video object extraction processing to 
generate video object descriptions from said video object information; processing said generated video object descriptions by 
object hierarchy construction and extraction processing to generate video object hierarchy descriptions..." Claim 49 recites 
"one or more object descriptions generated from said video information using video object extraction processing; one or more 
video object descriptions generated from said generated video object descriptions using object hierarchy construction and 
extraction processing." Therefore, the Examiner considers the Kim reference to teach the limitations of claims 1, 25, and 49 
as recited in the above rejection. 



RESPONSE TO ARGUMENTS 



NEW CITATIONS 



NONE 
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pcf 

WRITTEN fipiNintt 
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(PCT Rule 66) 
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Date of Mailing 

w-w 29 NO\/2Q00 
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International application No. 
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1 . This written opinion is the 



first 



(first, etc.) drawn by this International Preliminary Examining Authority. 



2. This opinion contains indications relating to the following items: 
Basis of the opinion 



I 




II 


□ 


HI 


□ 


IV 


□ 


V 




VI 


□ 


VII 


□ 


VIII 


□ 



Reasoned statement under Rule 66.2(a)(ii) with regard to novelty, inventive step or industrial applicability; 
citations and explanations supporting such statement 



Certain documents cited 
Certain defects in the international application 
VIII | | Certain observations on the international application 



Docketed 
For ) /^/2001By^ 



3. The applicant is hereby invited to reply to this opinion. 

When? Sec the time limit indicated above. Tho appli c ant may t before tho expiration of thot timo limit» roquoot thin A f S 
Authority to grant an oMtono i on., oo a RuU 66 . 3(d) , / f^p^ 

How? By submitting a written reply, accompanied, where appropriate, by amendments, according to Rule 66.3. 

For the form and the language of the amendments, sec Rules 66.8 and 66.9. 

Also For an additional opportunity to submit amendments, sec Rule 66.4. 

For the examiner's obligation to consider amendments and/or arguments, see Rule 66.4 to. 

For an informal communication with the examiner, sec Rule 66.6. 
If no reply; is filed, the international preliminary examination report will be established on the basis of this opinion. 

4. The final date by which the international preliminary UADru onm 
examination report must be estaMished according to Rule 69.2 is: °° MAKtH ZUUI . * 
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Washington, D.C. 20231 
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I. Basis of the opinion 



1 . With regard to the elements of the international application:* 
j X j the international application as originally filed 
the description: 

pages 

pages 



1-4* . - ___ > as originally Hied 

NONE , filed with the demand 



pages ; 

[x| the claims: 
pages , 



NONE , filed with the letter of 



47-61 ; ' , as originally filed 

pages NONE . as amended (together with any statement) under Article IV 



pages 

pages 

| x[ the drawings: 

pages 

pages 

pages 



NONE ; » filed with the demand 

NONE , filed with the letter of 



1-16 ' , as originally filed 

NONE , filed with the demand 

NONE " 9 filed with the letter of ■ : 



[ x| the sequence listing part of the description: 

pages NONE - . , as originally filed 

pages NONE . : ♦ filed with the demand 

pages NONE t filed with the letter of : 

2. With regard to the language, all the elements marked above were available or furnished to this Authority in the language in wiiich 
the international application was filed, unless otherwise indicated under this item. 

These elements were available or fiirnisbed to this Authority in the following language ' which is: 

| | the language of a translation furnished for the purposes of intemational search (under Rule 23.1(b)). 
| | the language of publication of the international application (under Rule 48.3(b)). 

| I (he language of the translation furnished for the purposes of international preliminary examination (under Rules 55.2 and/ 
or 55.3). 

3. With regard to any nucleotide and/or amino acid sequence disclosed in the international application, the written opinion was 
drawn on the basis of the sequence listing: 

contained in the international application in printed form. 
[ | filed together with the international application in computer readable form. 
[ [ furnished subsequently to this Authority in written form. 
[ [ furnished subsequently to this Authority in computer readable form. 

□ The statement that the subseauently furnished written sequence listing does not go beyond the disclosure in the 
international application as filed has been furnished. 

I I The statement that the information recorded in computer readable form is identical to the writen sequence listing has > 
■ — ' been furnished 

4 | x| The amendments have resulted in the cancellation of: 

03 the description, pages NONE 

DO the claims, Nos. NONE 

fx] the drawings, sheets/fig NONE 



5. Q This opinion has been drawn as if (some of) the amendments had not been made, since they have been considered to go 
beyond the disclosure as filed, as indicated in the Supplemental Box (Rule 702(c)). 

* Replacement sheets which have been famished to the receiving Office in response to an invitation under Article 14 are referred to 
in this opinion as "originally filed". 
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TIME LIMIT: 

The time limit set for response to a Written Opinion may not be extended. 37 CFR 1.484(d). Any response 
received after the expiration of the time limit set in the Written Opinion will not be considered in preparing the International 
Preliminary Examination Report. 

CLASSIFICATION: 

The International Patent Classification (IPC) anoVor the National classification are as listed below: 
IPC(7): G06F 17/30 and US CI.: 707/1, 103, 204; 345/327, 418, 430, 431, 439, 475; 348/154, 155, 429. 

V. 2. REASONED STATEMENTS - CITATIONS AND EXPLANATIONS (Continued): 

said video" (col. 5, lines 16-21), "feature extraction and annotation processing to generate one or more feature descriptions for 
one or more said regions" (col. 1, lines 53-63 and col. 48-65), and "said generated video object descriptions comprise said one 
or more feature descriptions for one or more said regions" (col. 5, lines 66-67 and col. 6, lines 1-16). Kim did not teach 
annotation processing, however, it would have been obvious to one having ordinary skill in the art at the time the invention 
was made to have annotation processing in performing step I, because annotation is a way of adding a comment or an 
explanation to the descriptions of the regions of the video information. 

With respect to claims 4 and 28, Kim teaches the invention substantially as claimed, "said regions are selected from the group 
consisting of local, segment, and global regions" (col. 5, lines 2-7). 

With respect to claims 5, 29, and 51, Kim teaches the invention substantially as claimed, "said one or more feature descriptions 
are selected from the group consisting of media features, visual features, temporal features, and semantic features" (col. 5, 
lines 10-15, lines 22-28, and lines 48-51). 

With respect to claims 6, 30, and 52, Kim taught, "said semantic features are further defined by at least one feature 
description selected from the group consisting of who, what object, what action, where, when, why, and text annotation" (col. 
4, lines 12-37). Kim did not explicitly teach the group consisting of who, what object, where, when , why, and text 
annotation, however, it would have been obvious to one having ordinary skill in the art at the time the invention was. made to 
have semantic features with a description selected from a group because the playing of the document can be altered by the 
definition and occurrence of the who, what object, where, when, why, and text annotation. 

With respect to claims 7, 31, and 53, Kim did not explicitly teach, "the visual features are further defined by at least one 
feature description selected from the group consisting of color , texture, position, size, shape, motion, camera motion, editing 
effect, and orientation, but it would have been obvious to one having ordinary skill in the art at the time the invention was 
made to have visual features defined by color, texture, position, size, shape, motion, camera motion, editing effect, and 
orientation because the spacial characteristics include the shape size,color, texture, position, motion, camera motion, editing 
effect, and orientation of the episode on display (see col. 4, lines 48-67). 

With respect to claims 8, 32, and 54, Kim does not teach, "media features are further defined by at least one feature 
description selected from the group consisting of media features of file format, file size, color representation, resolution, data 
file location, author, creation, scalable layer, and modality transcoding but it would have been obvious to one having ordinary 
skill in the art at the time the invention was made to have media features consisting of file format, file size, color 
representation, resolution, data file location, author, creation, scalable layer, and modality transcoding because the episodes of 
the temporal layout are organized into groups called temporal cliques and independently conform to their own spatial 
organization of media features. 

With respect to claims 9, 33, and 55, Kim teaches, "the temporal features are further defined by at least one feature 
description selected from the group consisting of start time, end time, and duration" (col. 6, lines 6-16). 

With respect to claims 10, 34, 56, Kim teaches, "said object hierarchy construction and extraction processing generates video 
object hierarchy descriptions of said video object descriptions based on visual feature relationships of video objects represented 
by said video object descriptions" (column 12, lines 51-64). 

Claims 11-24, 35-48, and 57-75 meets the criteria set out in PCT Article 33(2)-(4), because the prior art does not teach or 
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fairly suggest: As to claims 11-24, 35-48, and 57-75, the object hierarchy construction and extraction process generating video 
hierarchy descriptions of the video object descriptions based on semantic feature re'ationships of the video objects represented 
by the video object descriptions and based on the media feature relationships, the hierarchical levels comprising clustering 
hierarchies and multiple levels of abstraction hierarchies, taken together with the other claim limitations were not disclosed by 
the prior art of record. 

Claims 1-75 meets the criteria set out in PCT Article 33(4), because the techniques used for describing multimedia information 
and video information and the content of such information can be used to allow consumers and businesses to search for textual 
information on the World Wide Web. 

NEW CITATIONS 

NONE 
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C (Continuation). DOCUMENTS CONSIDERED TO BE RELEVANT 


Category* 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to claim No. 


Y 


US 5,555,354 A (STRASNICK et al) 10 September 1996, col. 6, 
lines 4-67, col. 7, lines 1-38, col. 8, lines 9-47, col. 9, lines 37-44 
and lines 66-67, col. 10, lines 1-23, col. 12, lines 1-42, col. 15, 
lines 20-67, col. 16, lines 1-47, col. 17, lines 6-18, col. 19, lines 
57-67, col. 20, lines 52-67, col. 21, lines 6-47, col. 22, lines 16-50, 
col. 23, lines 49-53, col. 26, lines 49-67, col. 27, lines 1-5, and 
lines 21-62, and col. 30, lines 37-44. 


1-75 


Y 


US 5,742,283 A (KIM) 21 April 1998, col. 1, lines 19-67, col. 2, 
lines 1-67, col. 3, lines 1-12 and lines 58-65, col. 4, lines 8-67, col. 
5, lines 22-67, col. 6, lines 1-41, col. 7, lines 2-56, col. 10, lines 
18-67, col. 11, lines 1-22 and lines 39-67, col. 12, lines 1-67, col. 
13, lines 1-67, col. 14, lines 41-63, col. 15, lines 28-67, col. 16, 
lines 1-67, col. 17, lines 1-9 and lines 17-45, and col. 18, lines 1- 
43. 


1-75 
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A. CLASSIFICATION OF SUBJECT MATTER 

IPC(7) : G06F 17/30 

USCL rPkauc Sec Extra Sheet. 
According to International Patent Classification (IPC) or to both national clarification and IPC 

a FIELDS SEARCHED • 

Minimum documentation searched (classification system followed by classification symbols) 

U.S. : 707/1, 103, 204; 345/327, 418, 430, 431, 439, 475; 348/154, 155, 429. 



Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and, where practicable, search terms used) 
Please See Extra Sheet. 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 



Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



US 5,821,945 A (YEO et al) 13 October 1998, col. 1, lines 31-67, 
col. 2, lines 1-67, col. 3, lines 1-67, col. 4, lines 1-67, col. 5, lines 
1-2 and 23-67, col. 6, lines 1-67, col. 7, lines 1-67, col. "8, lines 1- 
67, col; 9, lines 1-67, col. 10, lines 1-67, col. 11, lines 1-6 and lines 
16-40, and col. 12, lines 1-38. 
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| xj Further documents are listed in the continuation of Box C. | \ See patent family annex. 



•p* 



Special categories of cited documents: 

documeot defining the general state of the art which is not considered 
to be of particular relevance 

earlier document published on or after die international filing date 

document which may throw doubts on priority claim(i) or which is 
cited to establish the publication date of another citation or other 
special reason (as specified) 

document referring to ah oral disclosure, use, exhibition or other 
document published prior to the international filing date but later than 



later document published after the international filing date or priority 
date and not in conflict with the application but cited to understand 
the principle or theory underlying the invention 

document of particular relevance; the claimed invention cannot be 
considered novel or cannot be considered to involve an inventive step 
. when the document is taken alone 

document of particular relevance; the claimed invention cannot be 
considered to involve an inventive step when the document is 
combined with one or more other such documents, such combination 
being obvious to a person skilled in the art 

document member of the same patent family 



Date of the actual completion of the international search 
05 APRIL 2000 


Date of mailing of the international search report 

MO MAY 2000 


Name and mailing address of the ISA/US 
Commissioner of Patents and Trademarks 
Box PCT 

Washington, D.C 20231 
Facsimile No. (703)305-3230 


Autho/^off^^^ 

^HOSAIN ALAM 
Telephone No. (703)308-6662 
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Kom the IN I fcKNA I IUNAL bUHtAU 



PCT 



NOTICE INFORMING THE APPLICANT OF THE 
COMMUNICATION OF THE INTERNATIONAL 
APPLICATION TO THE DESIGNATED OFFICES 

(PCT Rule 47.1(c), first sentence) 



To: 



TANG, Henry 
Baker & Botts, LLP 
30 Rockefeller Plaza 
New York, NY 10112-0228 

ETATS-UN.SD'AMER.Q^ KER BOTTS LLP. 

00MAY30 PM |:Q| 



Date of mailing (day/month/year) 

18 May 2000(18.05.00) 




IMPORTANT^NOTieb f^^^i 


Applicant's or agent* s file reference 
32283-PCT 




International application No. 
PCT/US99/26126 


International filing date (day/month/year) 

05 November 1999 (05.11.99) 


Priority date (day/month/year) 

06 November 1998 (06.11.98) 


Applicant 

THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK et al 



1. Notice is hereby given that the International Bureau has communicated, as provided in Article 20, the international application 
to the following designated Offices on the date indicated above as the date of mailing of this Notice: 

AU,CN,JP,KP,KR,MA,US 

In accordance with Rule 47.1(c), third sentence, those Offices will accept the present Notice as conclusive evidence that 
the communication of the international application has duly taken place on the date of mailing indicated above and no copy 
of the international application is required to be furnished by the applicant to the designated Office(s). 

2. The following designated Offices have waived the requirement for such a communication at this time: 

AE^L^M^P^AZ^BA^B^BG^R^Y^CA^H^R^U^CZ^DE^DK^M^A^E^^ES^FlGB^aGE, 

GH^M^H^HUJDJLJN^S^^KG^KZ.LC^K^LR^SXT^U^^MaMaM^MKMW^X^O^^OA, 

PL / PT / RO r RU / SD / SE,SG / SI / Sk / SLTJ,TM,TR,TT f TZ / UA,UG,UZ,VN f YU,ZA / ZW 
The communication will be made to those Offices only upon their request. Furthermore, those Offices do not require the 
applicant to furnish a copy of the international application (Rule 49.1 (a-bis)). 

3. Enclosed with this Notice is a copyof the international application as published by the International Bureau on 
18 May 2000 (18.05.00) under No. WO 00/28725 

REMINDER REGARDING CHAPTER II (Article 31 (2) (a) and Rule 54.2) 

If the applicant wishes to postpone entry into the national phase until 30 months (or later in some Offices) from the priority 
date, a demand for international preliminary examination must be filed with the competent International Preliminary 
Examining Authority before the expiration of 19 months from the priority date. 

It is the applicant's sole responsibility to monitor the 1 9-month time limit. 

Note that only an applicant who is a national or resident of a PCT Contracting State which is bound by Chapter II has the 
right to file a demand for international preliminary examination. 

REMINDER REGARDING ENTRY INTO THE NATIONAL PHASE (Article 22 or 39(1 )) 

If the applicant wishes to proceed with the international application in the national phase, he must within 20 months 
or 30 months, or later in some Offices, perform the acts referred to therein before each designated or elected Office. 

For further important information on the time limits and acts to be performed for entering the national phase, see the 
Annex to Form PCT/IB/301 (Notification of Receipt of Record Copy) and Volume II of the PCT Applicant's Guide. 





The International Bureau of WIPO 
34, chemin des Colombettes 
1211 Geneva 20, Switzerland 

Facsimile No. (41-22) 740.14.35 


Authorized officer 

J. Zahra 

Telephone No. (41 -22) 338.83.38 
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F the INTERNATIONAL SEARCHING AUTHORITY 



o: HENRY TANG 

BAKER & BOTTS, LLP 
30 ROCKEFELLER PLAZA 
NEW YORK, NY 10112-0228 
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NOTIFICATION OF TRANSMITTAL OF ]MCU^ 
THE INTERNATIONAL SEARCH REPORT 
OR THE DECLARATION 



(PCT Rule 44.1) 



Applicant's or agent's file reference 
32283-PCT 



Date of Mailing 
(day/ month /year) 



10 MAY 2000 



FOR FURTHER ACTION See paragraphs 1 and 4 below 



International application No. 
PCT/US99/26126 



International filing date 
(day/ month /jear) 

05 NOVEMBER 1999 



Applicant 

THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK 



1. 1 x| The applicant is hereby notified that the international search report has been established and is transmitted herewith. 
Filing of amendments and statement under Article 19: 

The applicant is entitled, if he so wishes, to amend the claims of the international application (see Rule 46): 

When? The time limit for filing such amendments is normally 2 months from the~date of transmittal of the 
international search report; however, for more details, see the notes on the accompanying sheet. 



Where? Directly to the International Bureau of WIPO i 
34, chemin des Colombettes 
1211 Geneva 20, Switzerland 
Facsimile No.: (41-22)740.14.35 

For more detailed instructions, see the notes on the accompanying sheet. 



Docketed 
For ^ / /O /2000By 

2. The applicant is hereby notified that no international search report will be established and that the declaration under 
I — I Article 17(2)(a) to that effect is transmitted herewith. 

3. With regard to the protest against payment of (an) additional fee(s) under Rule 40.2, the applicant is notified that: 

□ the protest together with the decision thereon has been transmitted to the International Bureau together with the 
applicant's request to forward the texts of both the protest and the decision thereon to the designated Offices. 

| | no decision has been made yet on the protest; the applicant will be notified as soon as a decision is made. 

4. Further action(s): The applicant is reminded of the following: 

Shortly after 18 months from the priority date, the international application will be published by the International Bureau. 
If the applicant wishes to avoid or postpone publication, a notice of withdrawal of the international application, or of the 
priority claim, must reach the International Bureau as provided in rules 90 bis 1 and 90 bis 3, respectively, before the 
completion of the technical preparations for international publication. 

Within 19 months from the priority date, a demand for international preliminary examination must be filed if the applicant 
wishes to postpone the entry into the national phase until 30 months from the priority date (in some Offices even later). 

Within 20 months from the priority date, the applicant must perform the prescribed acts for entry into the national phase 
before all designated Offices which have not been elected in the demand or in a later election within 19 months from the 
priority date or could not be elected because they are not bound by Chapter II. 




Name and mailing address cf the ISA/US 

Commissioner of Patents and Trademarks 
Box PCT 

Washington, D.C 20231 
Facsimile No. (703)305-3230 f 



*onv t 




Aufftorj 

iOSAIN ALAM 

Telephone No. (703) 308-6662 
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INTERNATIONAL SEARCH REPORT 
(PCT Article 18 and Rules 43 and 44) 



Applicant's or agent's file reference 
32283-PCT 


FOR FURTHER see Notification of Transmittal of International Search Report 
ACTION (Form PCT/ISA/220) as well as, where applicable, item 5 below. 


International application No. 
PCT/US99/26126 


International filing date (day/month/year) 
05 NOVEMBER 1999 


(Earliest) Priority Date (day/month/year) 
06 NOVEMBER 1998 


Applicant 

THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OP NEW YORK 



This international search report has been prepared by this International Searching Authority and is transmitted to the applicant 
according to Article 18. A copy is being transmitted to the Internationa] Bureau. 

This international search report consists of a total of sheets. 

| X| It is also accompanied by a copy of each prior art document cited in this report. 



1. Basis of the report 

a. With regard to the language, the international search was carried out on the basis of the international application in the 
language in which it was filed, unless otherwise indicated under this item. 

□ the international search was carried out on the basis of a translation of the international application furnished to this 
Authority (Rule 23.1(b)). 

b. With regard to any nucleotide and/or amino acid sequence disclosed in the international application, the international search 
was carried out on the basis of the seque n ce listing: 

I 1 contained in the international application in written form. 

I 1 filed together with the international application in computer readable form. 

| | furnished subsequently to this Authority in written form. 

| | furnished subsequently to this Authority in computer readable form. 

|~| the statement that the subsequently furnished written sequence listing does not go beyond the disclosure in the 
j — j international application as filed has been furnished. 

| | the statement that the information recorded in computer readable form is identical to the written sequence listing has been 
t furnished. 

2. 1 1 Certain claims were found unsearchable (See Box I). 

3. Q Unity of invention Is lacking (See Box II). 

4. With regard to the title, 

| xj the text is approved as submitted by the applicant. 

the text has been established by this Authority to read as follows: 



□ 



5. With regard to the abstract, 

| 1 the text is approved as submitted by the applicant. 

| x| the text has been established, according to Rule 38.2(b), by this Authority as it appears in 
Box HI. The applicant may, within one month from the date of mailing of this international 
search report, submit comments to this Authority. 

2 



6. The figure of the drawings to be published with the abstract is Figure No 
| | as suggested by the applicant. 
| X| because the applicant failed to suggest a figure. 
[ 1 because this figure better characterizes the invention. 



| | None of the figures. 
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Box III TEXT OF THE ABSTRACT (Continuation of item 5 of the first sheet) 



NEW ABSTRACT 
Systems and methods for describing video content establishing video 
descriptions records which include an object set (24), an object hierarchy (26) and 
entity relation graphs (28). Video objects can include global objects, segment 
objects, and local objects. The video objects are further defined by a number of 
features organized in classes, which in turn are defined by a number of feature 
descriptors (36), (38), and (40). The relationships (44) between and among the 
objects in the object set (24) are defined by the object hierarchy (26) and entity 
relation graphs (28). The video description records provide a standard vehicle for 
describing the content and context of video information for subsequent access and 
processing by computer applications such as search engines, filters, and archive 
systems. 
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NOTES TO FORM PCT/ISA/220 (continued) 
JUSSSSi EE?" UlUttnU m ' OBer ta wU * must be explained fa u,e 



la what 



1. [Where c^gj rally there were 48 claims and after amendment of come daims there <ii 
Qatnu 1 to 29, 31. 32- 34. 35. 37 to 4* Knli^ hw -«~L^i!Sr tre 5l l 

claims 30, 33 tod 



Claim, , ,o 29. 3U 32. 34. 35. 37 » 4/^^^^^ £^g^£e\\Lbe„. 

ad 36 unchanged; new claims 49 to 5 1 added." numoers. 



2 " Kc°i ri S^ ly ^^llTu^ 15 d ? ims , tod aftef •mendenent of all claims there are 111: 
Claims 1 to 15 replaced by amended claims 1 to 1 1." ' 

3 - r^.^guully there ^re 14 claims and the .mendments consist in cancelling some d.im, and in adding 

if° f** 1 " 1 U .^ ua * ui A ?* iaa 7to 13 etooeUed: new claims 15, 16 and 17 added.- m 
Cuunu 7 to 13 cancelled; new daims 15. 16 and 17 added; all «liercl«fa« uncnanied^^ 
4 * C? >cre venous lands of amendments are made]: 

•CWms 1-10 unchanged; claims 11 to 13, 18 and 19 cancelled; claims 14. 15 and igrrrj. 

daia 14; daia 17 subdivided into amended daims 15. l^^neVSlm. SStTldded?^ 

"Statement under Article 19(1)" (Rale 44.4) 

The imrnrimcnf t may be accompanied by a sutement explaining the ..—^mn and indicating .„» 

The statement will be published with the international application and the amended daims. 
The sutemea ah«ld be brief; it slicViW nc< exceed 500 wc^ 

It should not be confounded with and doe» m ****** *k» t~*— ,. .._ . , 

report may be made only in connection with an amendment of thatdaunT^^^ mteraattonal search 



ifOutUusgealS^FnLh; otherwise, i, m^m^S* ^^n^e^^orme' .fp'S 



. Consequence if • 



for international prtummnry < 



i already been Died 7 



^l^J^^ C Jf^l^ a ^ U l ». « f« mternational preliminary examination 

SSjS^^S^^r^f ^^^Preteably. at the same tii^ofl^&ciaa^S^S^ 
A^^sL^^.frfm ,meadmeOU *■ Examining 



tcqoeaea with regard to translation of the International appUcatkM for entry fate the national phase 7 

The applicant* attention a drawn to the fact that, where unon entrr into the mtioo.i «t.^ . . 1..; 

^rfiMtherdetaih on the retirements ofiench designated/elected Office, see Volume II of the PCT Applicant's 
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NOTES TO FORM PCT/ISA/220 



These Notes are intended to give the basic instructions concerning tbe Tiling of amendments under Article 19. The 
Noces are based on tbe requirements of tbe Patent Cooperation Treaty and of tbe Regulations and the Administrative 
L^ruct ions under that Treaty. In case of discrepancy between these Notes tod those requirements, tbe Utter are applicable. 
For more deuiled information, see also the PCT Applicant s Guide, a publication of WIPO. 

In these IJotes, "Article", "Rule" and "Section" refer to the pro visions of tbe PCT, tbe PCT Regulations and the PCT 
Administrative Instructiomrrespectively. 



INSTRUCTIONS CONCERNING AMENDMENTS UNDER ARTICLE It 



The tppti~«l has, after having received tbe international search report, 
interlude^ ft sho^bowew be emphari^ 
descririoQ andSrawings) may be amended during tbe u*emationaJ prdmuaary 
to filf ■mtt rfr~*"*« rfmima under Article 19 



w to amend the claims of the 
international application (daims, 
occdure, there is usually 

»e litter to be oubltshed 

mepuiposcs of «o^siooal ptotectioa or has another reason for amen£m; the daims before international puottcation. 
Futtbem^ ptovisional protection is available in acme States only. 

What parts of the latentatioaal application may be amended ? 
Tbe daims only. 

Tbe description and the drawings may only be amended during international preliminary examination under 
Chapter IL 



Wheat 



wttt.it o mflfiAa fmm me date of transmittal of the infrmational search report or 16 months from the priority 
cUt* whfchever time limit c^krter. 

ashaving been received ooSme If they are received by ma Internatinnal Bureau after ^^™^ f ~ ™ 
applicable lime limit but before the completion of tbe technical p re pa r a t i o ns Cor Internationa. puMicauon 
(Rnte4*.l). 



Where not to die tbe amcavdmeata T 

The amendments may only be Gled with the International Bureau and not with the receiving Office or the 
International Searching Authority (Rule 46.2). 



Where v 4~r* mnA for intematipoai preliminary 



don has been/is Gled, see below. 



How? Ether by cnncduiigoc*« 



one or more of the claims as filed. 



■ wt rm i pi-n fpT rnfririi'*'-* Cat each sheet of the claims which, on account of i 
cfififem 6om the sheet originally Gled. 



or 



What 



ah M+i—m .m^m* m * iniittmot sheet must be nttmbered in Arabic numerals. Where a claim is 
til^^r^^t ^^^^^- In .11 esse. wb« cUims sre renumbered, they must 
• ~- > — — | consecutively (Administrative Instructions, Section 205(b)). 

■y •ocoapuy tfce *********** ? 

Letter (Section 205(b)): 

The amendments most be submitted with ■ letter. 

The letter will noc be published with me international spplicsuon andthejunended dda. aM be 

c^u^wiSlhV-SStenieot under Article \<K 1)" <•« below, under -Ststement under Article 19(1)"). 
Tbe letter must indicate tbe differences between the daims as G led and «*<*«n» ai |"»»^- 
Srtoitor,indicite,mcoiiiiecdonwi^ 

that identical indications coooeming several daims may be grouped), whether 



0) 
(ii) 
(iii) 
(iv) 

(v) 



tbe daim is unchanged; 
the daim is cancelled; 
the daim is new; 

tbe daim replaces one or more daims as filed; 

the daim is tbe result of the division o( a daim as filed. 
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From the INTERNATIONAL BUREAU 



PCT 

NOTIFICATION CONCERNING 
SUBMISSION OR TRANSMITTAL 
OF PRIORITY DOCUMENT 

(PCT Administrative Instructions, Section 411) 


To: 

TANG, Henry BAKER B0TTS Li.*. 
Baker & Botts, LLP _ J ,_, m iC — 
30 Rockefeller Plaza «9fHl K> ffOP r? 

New York, NY 101 12-0228 

ETATS-UNI5 D AMbKIUyp , jO /Ti J? 


Date of mailing (day/month/year) 
08 February 2000 (08.02.00) 




Applicant's or agent* s file reference 
32283-PCT 


IMPORTANT NOTIFICATION 


International application Na 
PCT/US99/26126 


International filing date (day/month/year) 
05 November 1999 (05.11.99) 


International publication date (day/month/year) 

Notyet>published 


Priority date (day/month/year) 

06 November 1998 (06.1 1.98) 


Applicant 

THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK et al 



1 The applicant is hereby notified of the date of receipt (except where the letters "NR" appear in the right-hand column) by the 
International Bureau of the priority, document(s) relating to the earlier application(s) indicated below. Unless otherwise 
indicated by an asterisk appearing next to a date of receipt or by the letters "NR", in the right-hand column the priority 
document concerned was submitted or transmitted to the International Bureau in compliance with Rule 17.1(a) or (b). 

2. This updates and replaces any previously issued notification concerning submission or transmittal of priority documents. 

3 An asterisk*') appearing next to a date of receipt in the right-hand column, denotes a priority document submitted 
or transmitted to the International Bureau but not in compliance with Rule 1 7.1 (a) or <b>. In such a case, the attention 
of the applicant is directed to Rule 1 7.1 (c) which provides that no designated Office may disregard the priority claim 
concerned before giving the applicant an opportunity, upon entry into the national phase, to furnish the priority document 
within a time limit which is reasonable under the circumstances. 

4 The letters "NIT appearing in the right-hand column denote a priority document which was not received by the International 
Bureau or which the applicant did not request the receiving Office to prepare and transmit to the International Bureau, 

as provided by Rule 17.1 (a) or (bfc respectively. In such a case, the attention of the applicant is directed to Rule 17.1 (c) which 
provides that no designated Office may disregard the priority claim concerned before giving the applicant an opportunity, 
upon entry into the national phase, to furnish the priority document within a time limit which is reasonable under the 
circumstances; 
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06 Nove 1998 (06,11.98) 60/107,463 
01 Febr 1 999 (01 .02.99) 60/1 1 8,020 
01 Febr 1999 (01.02.99) 60/118,027 
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US 
US 


24Janu 2000 (24.01.00) 
24Janu 2000 (24.01.00) 
24 Janu 2000 (24.01 .00) 


The International Bureau of WIPO 
34, chenun des Co4ornbettBS 
121 1 Geneva 20, Switzerland 


Authorized officer 1 
Carlos Naranjo WW* 


Facsimile No. (41-22) 740.1435 


Telephone No. (41-22) 338.8338 
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ETATS-UNIS D'AMERIQUE 



Datum/Date 



25/05/00 



L 



J 



AnnwJdung HrJAppkioation No7D»mand» n°7Pat»rrt Nr JPatmut NoJBrwv* n°. 



99960214. 7 - -PCT/US9926126 



Anm»U»t/Appfean^ 

THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF 



NOTE: The following information concerns the steps which you are 

required to take for entry into the regional phase before the EPO. 
You are strongly advised to read it carefully. Failure to take the 
appropriate steps in due time could lead to the application being 
deemed withdrawn. 



1. European patent application no. 99960214.7 has been allotted to the 
above-mentioned international patent application. 

2. Applicants having neither a residence nor their principal place of 
business within the territory of one of the EPC Contracting States 
may initiate the regional (European) processing of the international 
application themselves, provided they do so before expiry of the 21st 
or 31st month as from the priority date (see Legal Advice of the EPO 
no. 18/92 published in OJ EPO 1992, 58). 

Note, however, that such applicants must be represented in the 
regional phase before the EPO as designated or elected Office by a 
professional representative whose name appears on the EPO list of 
representatives (Arts. 133(2) and 134(1) EPC). 

After expiry of the 21st or 31st month, any procedural steps which 
are taken by the representative of the applicant in the international 
phase, who is not, however, entitled to practise before the EPO, will 
have no effect and will, thus, result in loss of rights. 

The appointment of a professional representative entitled to practise 
before the EPO is possible/ advisable at an early stage during the 
international phase (any time after the 14th month from the priority 
date) in view of representing applicants before the EPO as designated 
or elected Office. 



EPO FORM 1201 (08.98) 




000 



Therefore, an appointment in due time is strongly recommended, if it 
is intended that this representative should already act for entry 
into the regional phase, otherwise all communications will be for- 
warded from the EPO directly to the applicant. 

3. Applicants having their address within the territory of one of the 
EPC Contracting States are not obliged to appoint a professional 
representative entitled to practise before the EPO to represent them 
in the regional phase where the EPO is designated or elected Office. 

Note that due to the complexity of the proceedings, applicants are 
strongly advised to appoint such representative. Please keep in mind 
that, if a professional representative before the EPO has already 
acted for the applicant during the international phase, this repre- 
sentative is not automatically regarded as the representative for the 
regional phase. 

4. Applicants and professional representatives are recommended to 

file EPO Form 1200 (available free of charge from the EPO) for entry 
into the regional phase. The use of Form 1200, however, is not 
mandatory. . r 

5. FOR ENTRY INTO THE REGIONAL PHASE BEFORE THE EPO the following 
procedural steps must be taken. (Note that non-completion or in- 
effective completion of the required steps will result in loss of 
rights or other disadvantage.) 

5.1 Within 21 months from the date of filing or (where applicable) 
from the earliest priority date if the EPO acts as DESIGNATED 
OFFICE pursuant to Article 22(1) PCT: 

a) Filing of a translation of the international application in an 
EPO official language if the International Bureau did not 
publish the application in one of those languages (Art. 22(1) 
PCT and Rule 104b(l)(a) EPC). 

Note that if such translation is not filed in due time, the 
international application before the EPO is deemed withdrawn 
(Art. 24(l)(iii) PCT). 

b) Payment of the national fee [national basic fee, the designa- 
tion fee for each State designated, (where applicable) the 
claims fees for the eleventh and each subsequent claim] and 
the search fee, where a supplementary European search report 
has to be drawn up (Rule 104b(l)(b), (c) EPC). 

Upon expiry of the 21-month time limit provided for in Rule 
104b(l) EPC the EPO sends the applicant or his appointed profes- 
sional representative the communication pursuant to Rule 85a(l) 
EPC (Form 1217) and (where applicable) Rule 69(1) EPC (Form 1205) 
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unless it has been notified of its designation as elected Office 
in due time. 

5.2 Within 31 months from the date of filing or (where applicable) 

from the earliest priority date if the EPO acts as ELECTED OFFICE 
pursuant to Article 39(1) (a) PCX: 

a) Filing of a translation as under 5.1 a). 

b) Payment of the fees as under 5.1 b). 

c) Filing of the written request for examination and payment of 
the examination fee (Rule l04b(l)(d) EPC). 

Note that both acts must be performed in due time, otherwise 
the European patent application shall be deemed to be with- 
drawn (Art, 94(3) EPC). 

d) Payment of the renewal fee for the third year, if due before 
the expiration of the 31-month term (Rule 104b(l)(e) EPC). 

6. The amounts of the fees (and equivalents in all currencies of the 
contracting states of the EPC) are regularl y published in the 
Official Journal of the EPO. 

If the national basic fee, the designation fees or the search fee 
have not been paid in time, they may still be validly paid within a 
grace period of one month as from notification of an EPO commmica- 
tion (Rule 85a(l) EPC). 

If the renewal fee is not paid in time, it may still be validly paid 
within six months from the due date (Art. 86(2) EPC). 

In both cases, a surcharge is due. 

7. The international search report under Article 18 PCT (or the declara- 
tion under Article 17(2)(a) PCT) has been published by the Interna- 
tional Bureau. The date of publication can be ascertained from the 
copy of the published application documents sent by the International 
Bureau or from the international search report, if published sepa- 
rately. This publication takes the place of the mention of the publi- 
cation of the European search report (Art. 157(1) EPC). 

A request for examination, comprising a written request and payment 
of the examination fee, must be filed up to the end of six months 
after the above date. 
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However, in view of Article 22 or 39 PCT in conjunction with Rule 
104b(l)(d) EPC, the period for filing the request for examination 
does not expire before 21 or 31 months, respectively, from the date 
of filing (where applicable, the earliest priority date). 

A period of grace of one month from notification of an EPO communica- 
tion is available in case either or both of the above acts have not 
been performed in time. Accordingly, a surcharge is due (Rule 85b 
EPC). 

This information letter is addressed by the EPO to the agent, if any, 
having acted for the applicant during the international phase of the 
application. 

Any further notifications on procedural matters will be addressed to 
the applicant, respectively his European representative, if the 
appointment of the latter has been communicated to the EPO in due 
time. 

For further details see the information for PCT applicants concerning 
time limits and procedural steps before the EPO as a designated and 
as an elected Office under the PCT (published as Supplement No. 1 to 
OJ EPO 12/1992, with changes published in OJ EPO 1994, 131). 

Concerning the list of professional representatives before the 
European Patent Office (see points 2 and 3), EPO Form 1200 (see 
point 4) and the actual fees to be paid (see point 6) we refer to 
the EPO's Internet address: 
http : / /www . european-pat ent -of f ice . org . 
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PCT 



NOTIFICATION OF RECEIPT OF 
RECORD COPY 

(PCT Rule 24.2(a)) 



i-rom me iim i chna i iuwml puncMu 

To: BAKER BOTTS L.L.P. 

00 FEB -7 PHI* 05 

TANG, Henry 
Baker & Botts, 
30 Rockefeller 
New York, NY 



ETATS-UNIS D'AMERIOAJE 




Date of mailing (day/month/year) 
26 January 2000 (26.01 .00) 


IMPORTANT NOTIFICATION 


Applicant's or agent* s file reference 
32283-PCT 


International application No. 
PCT/US99/26126 



The applicant is hereby notified that the International Bureau has received the record copy of the international application as 
detailed below.. 

Name(s) of the applicant(s) and State(s) for which they are applicants: 

THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK et al (for all 

designated States except US) 
PAEK, Seungyup et al (for US) 

International filing date 

Priority date(s) claimed : 



05 November 1999 (05.11.99) 

06 November 199a (06. 11.98) 
01 February 1999 (01.02.99) 
01 February 1999 (01.02.99) 



Date of receipt of the record copy 
by the International Bureau 

List of designated Offices 



06 January 2000 (06.01 .00) 



AP :GH,GM,KE,LS,MW,SD,SL,SZ,TZ,UG,ZW 
EA :AM>\Z,BY,KG,KZ,MD,RU,TJ,TM 

EP :AT,BE / CH,CY < DE,DK,ES,FI,FR,GB,GR,IE,IT,LU,MC,NL,PT,SE 
OA :BF,BJ,CF,CG,CI,CM,GA,GN,GW,ML,lv1R,NE,SN,TD,TG 

National :AE^L^M^T^U>\Z,BA,BB f BG,BR,BY / CA,CH,CN / CR,CU,CZ,DE,DK,DM / EE,ES,FI,GB > 

GD,GE,GH,GM / HR,HUJD,IL,INJSJP,KE,KG,KP / KR,KZ,LC,LK,LR,LS,LT,LU,LV,MA,MD / MG,lv1K, 

MN,MW,MX,NO,NZ,PL,FT,RO,RU,SD,SE,SG / SI.SK,SW 

ZW 



The International Bureau of WIPO 


Authorized officer: yy^l^s^ 


34, chemirt des Coiombettes 


Befttriz Morariu 


1211 Geneva 20, Switzerland 




Facsimile No. (41-22) 740.14.35 


Telephone No. (41 -22) 338.83.38 



Form PCT/1B/301 (July 1998) 



003070418 



Date of mailing (day/month/year) 
26 January;2000 (26.01.00) 


IMPOhTANT NOTIFICATION 


Applicant's or agent's file reference 
32283-PCT 


International application No* 

PCT/US99/2612S 


ATTENTION 

The applicant should carefully check the data appearing in this Notification. In case of any discrepancy between these data 
and the indications in the international application, the applicant should immediately inform the International Bureau. 

In addition, the applicant's attention is drawn to the information contained in the Annex, relating to: 


| X| time limits for entry into the national phase 




[ j confirmation of precautionary designations 




| X| requirements regarding priority documents 




A copy of this Notification is being sent to the receiving Office and to the International Searching Authority. 

■ "* . 











Form PCT/IB/301 (continuation sheet) (July 1998*, 003070418 



INFORMATION ON TIME LIMITS BOR ENTERING THE NATIONAL PHASE 

The applicant is reminded that the "national phase" must be entered before each of the designated Offices indicated in the 
Notification of Receipt of Record Copy (Form PCT/IBtfOI) by paying national fees and furnishing translations, as prescribed by 
the applicable national laws. 

The time limit for performing these procedural acts is 20 MONTHS from the priority date or, for those designated States- 
which the applicant elects in a demand for international preliminary examination or in a later election, 30 MONTHS from the 
priority date, provided that the election is made before the expiration of 1 9 months from the priority date. Some designated (or 
elected) Offices have fixed time limits which expire even later than 20 or 30 months from the priority date. In other Offices an 
extension of time or grace period, in some cases upon payment of an additional fee, is available. 

In addition to these procedural acts, the applicant may also have to comply with other special requirements applicable in 
certain Offices. It is the applicant's responsibility to ensure that the necessary steps to enter the national phase are taken in a 
timely fashion. Most designated Offices do not issue reminders to applicants in connection with the entry into the national 
phase. 



For detailed information about the procedural acts to be performed to enter the national phase before each designated 
Office, the applicable time limits arid possUte extensions of time or grace peri^ any other requirements, see the relevant 
Chapters of Volume II of the PCT Applicant's Guide. Information about the requirements for filing a demand for international 
preliminary examination is set out m Chapter IX of Volume I of the PCT Applicant's Guide. 

GR and ES became bound by PCT Chapter II on 7 September 1996 and 6 September 1 997, respectively, and may, therefore, 
be elected in a demand or a later election filed on or after 7 September 1996 and 6 September 1997, respectively, regardless of 
the filing date of the international application. (See second paragraph above.) 

Note that only an applicant who is a national or resident of a PCT Contracting State which is bound by Chapter II has 
the right to file a demand for international preliminary examination. h . • 

CONFIRMATION OF PRECAUTIONARY DESIGNATIONS 

This notification lists only specific designations made under Rule 4.9(a) in the request It is important to check that these 
designations are correct Errors in designations can be corrected where precautionary designations have been made under 
Rule 4.9(b). The applicant is hereby reminded that any precautionary designations may be confirmed according to Rule 4.9(c) 
before the expiration of 1 5 months from the priority date. If it is not confirmed, it will automatically be regarded as withdrawn 
by the applicant There will be no reminder and no invitation. Confirmation of a designation consists of the filing of a notice 
specifying the designated State concerned (with an indication of the kind of protection or treatment desired) and the payment 
of the designation and confirmation fees. Confirmation must reach the receiving Office within the 15-month time limit 

REQUIREMENTS REGARDING PRIORITY DOCUMENTS 

For applicants who have not yet complied with the requirements regarding priority documents, the following is recalled. 

■ Where the priority of an earlier national, regional or international application is claimed, the applicant must submit a copy 
of the said earlier application, certtfied by the authority with which it was filed ("the priority document") to the receiving Office 
(which will transmit it to the Irrternatranal Bureau) or directly to the International Bureau, before the expiration of 1 6'months from 
the priority date^ provided that any such priority document may still be submitted to the International Bureau before that date of 
international publication of the international application, in which case that document will be considered to have been received 
by the International Bureau on the last day of the 16-month time limit (Rule 17.1(a)). 
'W&*&^~ ■ " ■ . . ' : y- • ■ 

Where the priority document is issued by the receiving Office, the applicant may, instead of submitting the priority 
document request the receiving Office to prepare and transmit the priority document to the International Bureau. Such request 
must be made before the expiration of the 16-month time limit and may be subjected by the receiving Office to the payment 
of a fee (Rule 17.1(b)); 

If the priority document concerned is not submitted to the International Bureau or if the request to the receiving Office 
to prepare and transmit the priority document has not been made (and the corresponding fee, if any, paid) within the applicable 
time limit indicated under the preceding paragraphs, any designated State may disregard the priority claim, provided that no 
designated Office may disregard the priority daim concerned before giving the applicant an opportunity to furnish the priority 
document within a time limit which is reasonable under the circumstances. 

Where several priorities are claimed, the priority date to be considered for the purposes of computing the 1 6-month time 
limit is the filing date of the earliest application whose priority is claimed. 



I 



Form PCT/IB7301 (Annex) (July 1998)/ 



003070418 



From the INTERNATIONAL BUREAU 



PCT 

INFORMATION CONCERNING ELECTED 
OFFICES NOTIFIED OF THEIR ELECTION 

(PCT Rule 61.3) 



Date of mailing (day/month/year) 
28 August 2000 (28.08.00) 



To: 



TANG, Henry 

Baker & Botts, LLP r> « ,,_ 
30 Rockefeller Plaza ^'S^Wefl'SrjftsUPL 



New York, NY 10112-022£L 
ETATS-UNIS D'AMERIQOEWSEP 1^ |g 




International application No. 
PCT/US99/26126 



International filing date (day/month/year) 
05 November 1999 (05.11.99) 



Priority date (day/month/year) 

06 November 1998 (06.11.98) 



P. 



Applicant 



THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK et al 



1. The applicant is hereby informed that the International Bureau has, according to Article 31(7), notified each of the following 
Offices of its election: 

AP :GH,GM,KE,LS,MW,SD,SL,SZJZ,UG,ZW 

EPrAT^E^CH^D^D^ES^I^R^GB^GRJE^T^U^aNUP^SE 

National lAU^BG^^CAXN^DEJUP^KP^K^MN^CNZ^PURCRU^^SK.US 

2. The following Offices have waived the requirement for the notification of their election; the notification will be sent to them 
by the International Bureau only upon their request: 

EA :AM,AZ^Y,KG,KZ,MD,RU/rJ,TM 

OA iB^BJ^CF^G^I^M^A^GN^W^UMR^^SNJDJG 

National rAE^AUAM^AZ^A^B^Y^H^R^CU^^DM^E^S^I^B^aGE^GH^M, 

HRmiDJNJS,KE,KG,KZXC,LK,LR,^^ 

SI,SL,TJ JMJR^TZ^A^G^Z^N^UZA^W 

3. The applicant is reminded that he must enter the "national phase" before the expiration of 30 months from the priority date 
before each of the Offices listed above. This must be done by paying the national fee(s) and furnishing , if prescribed, a 
translation of the international application (Article 39(1 )(a)), as well as, where applicable, by furnishing a translation of any 
annexes of the international preliminary examination report (Article 36(3)(b) and Rule 74.1). 

Some offices have fixed time limits expiring later than the above-mentioned time limit For detailed information about the 
applicable time limits and the acts to be performed upon entry into the national phase before a particular Office, see Volume II 
of the PCT Applicant's Guide. 

The entry into the European regional phase is postponed until 31 months from the priority date for all States designated for 
the purposes of obtaining a European patent 



■ - - m * 



The International Bureau of WIPO 


Authorized officer: 


34, chemin des Colombettes 


Manu Berrod rrfj 


1211 Geneva 20, Switzerland 




Facsimile No. (41-22) 740.14.35 


Telephone No. (41-22) 338.83.38 / 



Form PCT/IB/332 (September 1997) 



3491878 



Tec 

HENRY TANG 
BAKER & BOTTS, LLP 
30 ROCKEFELLER PLAZA 
NEW YORK NY 10112-0228 


NOTIFICATION OF RECEIPT . - ..... 1 
OF DEMAND BY COMPETENT INTERNATIONAL - ^ 
PRELIMINARY EXAMINING AUTHORITY ^ 

(PCT Rule 593(e) and 61.1(b), first sentence ^ 

and Administrative* Tnctn vtvinc Cvtinn Afll \r\~. 

cum nuuiuiDUOUfv luMi ULUUuay OCL4JUQ UUIIall | V« 


Date of inaiiinfi A _ 

(dayktJaw 26 JUL 2QQQ ' 


Applicant's or agent's file reference 
32283-PCT 


IMPORTANT NOTIFICATION 



PCT/US99/26126 



05 NOV 99 



Priority date (daytmonihfyear) 
06 NOV 98 



Applicam 



THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF 
NEW YORK 



1. The applicant is hereby notified that this International Preliminary Examining Authority considers the following date as the 
date of receipt of the demand for international preliminary examination of the international application: 

2. That date of receipt is: 

the actual date of receipt of the demand by this Authority (Rule 61.1(b)). 

the actual date of receipt of the demand on behalf of this Authority (Rule 593(e)). 

[~| 011 w^ch this Authority has, in response to the invitation to correct defects in the demand (Form 

PCT/IPEA/404), received the required corrections. 

3. r~| ATTENTION: That date of receipt is AFTER the expiration of 19 months from the priority date. Consequently, the 
— « elections) made in the demand does (do) not have the effect of postponing the entry into the national phase until 

30 months from the priority date (or later in some Offices) (Article 39(1)). Therefore, the acts for entry into the 
national phase must be performed within 20 months from the priority date (or later in some Offices) (Article 221 
For details, see the PCT Applicant's Guide, Volume II. 

| | (If applicable) This notification confirms the information given by telephone, facsimile transmission or in person 



ont 



4. Only where paragraph 3 applies, a copy of this notification has been sent to the International BureaQpCketed 

For f /2000B 




Name and mailing address of the IPEA/US 
Assistant Commissioner for Patents 
Box PCT 

Washington, D.C 20231 Attn; IPEA/US 

Facsimile No. 



Authorized officer 

Jsannette Washington * 
PCT Operations - IAPD Team 1 ^ 
Tdeptff03)*ft5-3687 (703) 305-3230 ffAy 



Form PCT/IPEA/402 (July 1998) 



Certification under 37 CFR 1.10 (if applicable) 



EJ339573419US 



Express Mail mailing number 



5 June 2000 



Date of Deposit 



1 hereby certify that the application/correspondence attached hereto is being deposited with the United States Postal Service 
"Express Mail Post Office to Addressee" service under 37 CFR 1.10 on the date indicated above and is addressed to Assistant 
CommissioB*cfor Patents, Washington, D.C. 20231. 




Leroy Chick 



Typed or printed name of person mailing correspondence 



II. New International Application 




Earliest priority date 
(Day/Month/Year) 



SCREENING DISCLOSURE INFORMATION: In order to assist in screening the accompanying international 
application for purposes of determining whether a license for foreign transmittal should and could be granted and for 
other purposes, the following information is supplied. (Note: check as many boxes as apply): 

The invention disclosed was not made in the United States. 



international application. (NOTE: priority to these applications may or may not be claimed on form PCT/RO/Wl 
(Request) and this listing does not constitute a claim for priority). 



A. 


□ 


a 


□ 


c. 


□ 



1 application no. | 


filed on 




| application no. | 


filed on 





' — ' application(s) identified in paragraph C. 
E O The pr« ent international application Q contains additional subject matter not fou nd in the prior U.S. application(s) 
identified in paragraph C. above. The additional subject matter is found on pages 



and □ DOES NOT ALTER □ MIGHT BE CONSIDERED TO ALTER the general nature of the invention in a 
manner which would require the U.S. application to have been made available for inspection by the appropriate defense 
agencies under 35 U.S.C. 181 and 37 CFR 5.1. See 37 CFR 5.15 • " ■ 



III. Q A Response to an Invitation from the RO/US. The following document(s) is (are) enclosed: 
(~~| A Request for An Extension of Time to File a Response 
A Power of Attorney (General or Regular) 
Replacement pages: 



A. 
B. 
C. 



□ 
□ 



pages 




of the request (PCT/RO/101) 


pages 




of the figures 1 


pages 




of the description 


pages 




of the abstract | 


pages 




of the claims 





D. Submission of Priority Documents 



Priority document 



Priority document 



E. Q Fees as specified on attached Fee Calculation sheet form PCT/RO/1 0 1 annex 



rv | I A Request for Rectification under PCT 91 [U A Petition 



IT A Sequence Listing Diskette 



V Other (please specify)* Demand for International Preliminary Examination (6 sheets). Fee Calculation Sheet, a postcard, and a 
" ^ check in the amount of $643. 



["~| Applicant % 


Paul A. Ragusa 


iwi Attorney/Agent (Reg. No.) 
d 38.587 


Typed name of signer 

y i 


|~J Common Representative 


// Sienaiure ~ 



The person 
signing this 
form is the: 



PTO-1382(Rev. 4-1995) 



Copyright 1996 Lcgaboft 



PCT 

DEMAND 

under Article 3 ! of the Patent Cooperation Treaty: 
The undersigned requests that the international application specified below be the subject of 
international preliminary examination according to the Patent Cooperation Treaty and 
hereby elects all eligible States (except where otherwise indicated). 



CHAPTER H 



For International Preliminary Examining Authority use only 



Identification of IPEA 



Boi No. I IDENTIFICATION OF THE INTERNATIONAL APPLICATION 


Applicant's or agent's file reference 
32283-PCT 


International application No. 
PCT/US99/26126 


International filing date (day/month/year) 
05 November 1999 ( 05.11.99 ) 


(Earliest) Priority date (day/month/year) 
06 November 1998 ( 06.11.98 ) 


Title of invention 

VIDEO DESCRIPTION SYSTEM AND METHOD 



Box No. II APPLICANTS) 



Name and address: (Family name followed by given name; far a legal entity, fall official 
designation. The address must include postal code and name of country.) 

THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK 
Broadway and 1 16th Street 
New York, NY 10027 . 
US 



Telephone No.: 



Facsimile No.: 



Teleprinter No.: 



State (that is, country) of nationality: 
US 



State (that is, country) of residence: 
US 



Name and address: ^J^^J°{ Iowcd by given name; for a legal entity, fall official designation. The address must include postal code and 
name oj country./ 



AT&T 

AT&T Labs, Room 3-237 
100 Schuttz Drive-Middletown 
Redbank, NJ 07701 
US 



State (that is, country) of nationality: 
US 



State (that is, country) of residence: 
US 



Name and address: (Family name fallowed by given name; for a legal entity, fall official designation. The address must include postal code and 
name oj country.) 

IBM 

T.J. Watson Research Center 
30 Saw Mill River Road 
Hawthorne, NY 10532 
US 



State (that is, country) of nationality: 
US 



State (that is, country) of residence: 
US 



X| Further applicants are indicated on a continuation sheet 



Form PCT/IPEA/4Q1 (first sheet) (July 1998; reprint January 2000) 



LeodStar 2000. Font, pctdem See Notes to the demand form 



Continuation of Box No. II APPLICANT(S) 


If none of the following sub-boxes is used, this sheet is not to be included in the demand. 


Name and address: (Family name followed by given name; for a legal entity, full official designation. The address must include postal code and 
name of country J 


PAEK f SEUNGYUP 
530 Riverside Drive, Apt. 6J 
New York, NY 10027 
US 




State (that is, country) of nationality: 
KR 


State (that is, country) of residence: 
US 


Name and address: (Family name followed by given name; for a legal entity, full official designation. The address must include postal code and 
name of country.) 


BENITEZ, ANA 

400 West 1 19th Street, Apt. 9F 

New York, NY 10027 

US 




State (that is, country) of nationality: 
ES 


State (that is. country) of residence: 
US 


Name and address: (Family name followed by given name; for a legal entity, full official designation. The address must include postal code and 
name of country.) 


CHANG. SHIH-FU 

560 Riverside Drive, Apt. 18K 

New York, NY 10027 

US 




State (that is, country) of nationality: 
TW 


State (that is, country) of residence: 
US 


Name and address: (Family name followed by given name; for a legal entity, full official designation. The address must include postal code and 
name of country.) 


State (that is, country) of nationality: 


State (that is. country) of residence: 


j^| Further applicants are indicated on another continuation sheet. 



Form PCT/IPEA/401 (continuation sheet) (July 1998; reprint January 2000) Legaistar2000. Form pctdem See Notes t0 the demand form 





Continuation of Box No. II APPLICANT(S) 


If none of the following sub-boxes is used, this sheet is not to be included in the demand. 


Name and address: (Family name followed by given name; for a legal entity, full official designation. The address must include postal code and 
name of country.) 


PURI.ATUL 

AT&T Labs, Room 3-237 
1 00 ocnultz Unve-Middletown 
Redbank, NJ 07701 
US 




State (that is, country) of nationality: 

us 


State (that is, country) of residence: 
US 


Name and address: (Family name followed by given name; for a legal entity, full official designation. The address must include postal code and 
name of country.) 


HUANG, QIAN 
AT&T Labs, Room 3-237 
100 Schultz Drive-Middletown 
Redbank, NJ 07701 

us 




State (that is, country) of nationality: 

us 


State (that is, country) of residence: 
US 


• "\ 

Name and address: (Family name followed by given name; for a legal entity, full official designation. The address must include postal code and 
name of country.) 


LI, CHUNG-SHENG 
50 Croton Avenue, Apt. 2C 
Ossining, NY 10562 
US 




State (that is, country) of nationality: 
US 


State (that is. country) of residence: 
US 


Name and address: (Family name followed by given name; for a legal entity, full official designation. The address must include postal code and 
name of country.) 


SMITH, JOHN R. 

275 West 96th Street, Apt. 15B 

New York, NY 10025 

US 




State (that is. country) of nationality: 
US 


State (that is. country) of residence: 
US 


|V[ Further applicants are indicated on another continuation sheet 



Form PCT/1PEA/401 (continuation sheet) (July 1998; reprint January 2000) Leoaistar 2000. Form pctoem See Notes to the demand fc 



Continuation of Box No. II APPLICANT(S) 


If none of the following sub-boxes is used, this sheet is not to be included in the demand, 


Name and address: (Family name followed by given name; for a legal entity, full official designation. The address must include postal code and 


name of country.) 




BERGMAN, LAWRENCE 




IBM 




T J. Watson Research Center 




30 Saw Mill River Road 




Hawthorne, NY 10532 




US 




State (thai is, country) of nationality: 


State (that is, country) of residence: 


US 


US 


Name and address: (Family name followed by given name; for a legal entity, full official designation. The address must include postal code and 


name of country.) 




State (that is, country) of nationality: 


State (that is, country) of residence: 


Name and address: (Family name followed by given name; for a legal entity, full official designation. The address must include postal code and 


name of country.) 




State (that is, country) of nationality: 


State (that is, country) of residence: 


Name and address: (Family name followed by given name; for a legal entity, full official designation. The address must include postal code and 


name of country.) 




State (that is, country) of nationality: 


State (that is; country) of residence: 


j | Further applicants are indicated on another continuation sheet 



Form PCT/IPEA/40 1 (continuation sheet) (July 1 998; reprint January 2000) teoaistar 2000, Form pctdem See Notes tQ the demand form 



Box No. HI AGENT OR COMMON REPRESENTATIVE; OR ADDRESS FOR CORRESPONDENCE 



The following person is Kl agent Q common representative 

and has been appointed earlier and represents the applicant(s) also for international preliminary examination. 
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APPENDIX A: Document Type Definition of Video Description Scheme 

videods.dtd: 



<!- Video DS -> 

<!- Entities are like macros. They can be referenced by using the notation H %EntityName; n ~ 

> 

<!- For clarity, we have chosen not to reference them in this DTD, although some of them 
are derived from the Image DTD — > 

<!-- Some of the elements in this DTD are inherited from the image DTD: object hierarchy, 
entity_relation_graph, etc. We have included some of them in this annex. ~> 
<! ENTITY % videq_object_elements H 

vid_obj_media_features?, 

vidobjsemanticfeatures?, 

vid_obj_visuaI_features?, 

vid_obj_temporal_features? M > 

<FENTITY % ref_video_object_attributes M 
%image_object_attributes; "> 

<! ENTITY % only_vid_obj_media_features_elements " 
bit_rate? H > 

<!ENTITY % vid_obj_media_features_elements M 
%img_obj_media_features_elements; 
%only_vid_obj_media_features_elements; M > 

<!ENTITY % vid_obj_semantic_features_eIements " 
%img_obj_semantic_features_elements; ,, > 

<!ENTITY % only_vid_obj_visual_features_elements " 

video_scI?, visual_sprite?, transition?, camera_motion?, size?, key_frame* M > 

<!ENTITY % vid_obj_visual_features_elements " 
%img_objjvisual_features_elements; 
%onIy_vid_obj_visual_features_elements; M > 

<!ENTITY % vid_obj_temporal_features_elements M 
time?"> 
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<!ELEMENT video (video_object_set, eventhierarchy*, entity_relation_graph*)> 
<!ATTLIST video 

id ID #IMPLIED> 



<!ELEMENT video_object_set (video_object+)> 

<!ELEMENT video_object (vid_obj_media_features?, vid_obj_semantic_features?, 

vid_obj_visual_features?, vid_obj_temporal_features?)> 

<!ATTLIST video_object 

type (LOCAL|SEGMENT|GLOBAL) ^REQUIRED 
id ID #IMPLIED 

object_ref IDREF #IMPLIED 
object_node_ref IDREFS ^IMPLIED 
entity_node_ref IDREFS #IMPLIED> 

<!ELEMENT vid_obj_media_features ( 
bitrate?, 

location?, file_fonnat?, filesize?, resolution?, modality_transcoding?)> 

<!ELEMENT vid_obj_semantic_features (text_annotation?, who?, what action? where 7 
why?, _ 

when?)> 

<!ELEMENT vid_obj_visual_features ( 

image_scl?, color?, texture?, shape?, size?, position?, motion?, 

video_scl?, visual_sprite?, transition?, camera_motion?, size?, key_frame*)> 

<!ELEMENT vid_obj_temporal_features (time?)> 

<!- The object hierarchy and the entity relation graph are defined in the Image DS (Proposal 
# 480). We include them in this DTD for convenience. -> 

<!— Object hierarchy element — > 
<!- The attribute type is the hierarchy binding type -> 
<!ELEMENT object_hierarchy (object_node)> 
<!ATTLIST object_hierarchy 

id ID #IMPLIED 

type CDATA #IMPLIED> 
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<!ELEMENT object_node (object_node*)> 
<!ATTLIST object_node 

id ID ^IMPLIED 

objectref IDREF #REQUIRED> 

<!— Entity relation graph element— > 

<!— Possible types of entity relations and entity relation graphs: 

- Spatial: topological, directional 

- Temporal: topological, directional 

- Semantic — > 

<!ELEMENT entity_relation_graph (entity_relation+)> 
<!A1 TLIST entity_relation_graph 

id ID #IMPLIED 

type CDATA #IMPLIED> 

<!ELEMENT entity_relatioh (relation. (ehtity_node | entity_node_set | entity_relation)*)> 
<!ATTLIST entity_relation 

type CDATA #IMPLIED> 

<!ELEMENT relation (#PCDATA | code)*> 

<!ELEMENT entity_node (#PCDATA)> 
<!ATTLIST entity_node 

id ID #IMPLIED 

object_ref IDREF #REQUIRED> 

<!ELEMENT entity_node_set (entily_node+)> 

<!- External image DS DTD -> 

<!ENTITY % image_ds SYSTEM "image_ds.dtd M > 

%image_ds; 

<!— External scalable video DTD -> 

<!ENTITY % video_scI SYSTEM n video_scl.dtd"> 

%video_scl; 

<!— External visual sprite DTD ~> 

<!ENTITY % visual_sprite SYSTEM ,, visual_sprite.dtd"> 

%visual_sprite; 
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<!-- External transition DTD — > 

<! ENTITY % transition SYSTEM "transition.dtd 1 ^ 

%transition; 

<!-- External camera motion DTD — > 

<! ENTITY % camera_motion SYSTEM M camera_motion.dtd M > 
%camera_motion; 

<!ELEMENT key_frame (size_dimensions?, time_instant?)> 
<!- Video DS end ~> 



location.dtd: 



<!— Description of resources' location— > 

<!— Objects, image, videos can be located/accessed at different locations — > 
<! ELEMENT location (location_site*)> 

<!-- One location site — > 

<! ELEMENT location_site EMPTY> 

<!ATTLIST location_site 

href CD ATA #REQUIRED 

title CDATA #IMPLIED> 

<!ELEMENT code (location*)> 
<!ATTLIST code 

type (EXTRACTION|DI STANCE) "EXTRACTION" 

language (C|JAVA|PERL) #REQUIRED 

version CDATA #REQUIRED> 

<!— Description of resources' storage location — > 



video scl.dtd: 



<!— Video scalability features — > 
<!ELEMENT video_scl(video_sclobj. code*) 

<!ELEMENT video_sclobj(vid_obj_scltype. vid_obj_mode, vid_obj_numlayers, codref, 
subsampfactor, vid_obj_shape?)> 
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<!ELEMENT vid_obj_scltype EMPTY> 
<!ATTLIST vid_obj_scltype 

typeinfo (DATPARTITI ON| SPATI AL|TEMPORAL | SNR) #REQUIRED> 

<!- Video scalability (subtype) mode features -> 
<!ELEMENT vid_6bj_mode EMPTY> 
<!ATTLIST vid_obj_mode 

modeinfo CDATA #REQUIRED> 

<!ELEMENT numlayers EMPTY> 
<!ATTLIST vid_obj_numlayers 
numval #REQUIRED> 

<!ELEMENT codref EMPTY> 
<! ATTLIST codref 

layernum #REQUIRED> 

<— subsampling ratio n/m for horizontal and vertical directions — > 
^ELEMENT subsamp_factor EM PT Y> 
<!ATTLIST subsamp_factor 

hor_factor_n CDATA #R£QU1RED 

hor_factor_m CDATA #REQUIRED 

vert_factor_n CDATA #REQUIRED 

vert_factor_m CDATA #REQUIRED> 

<!ELEMENT vid_obj_shape(^/io/?e)> 
<!- scalability features end -> 



visualsprite.dtd: 

<!- visual sprite features -> 
<!ELEMENT visual_sprite (vis_spriteobj)> 
<!ELEMENT vis_spriteobj (vis_spritcobj_info. codc*)> 

<! ELEMENT vis_spriteobj_info (vis_sprite_dim. vis_sprite_shape, vis_sprite_trajectory, 
vis_sprite_warp. vis_sprite_brightness. vis_spritc_texture)> 

<! ELEMENT vis_sprite_dim (sprite_size. spritc_num_pts, sprite_coord)> 
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<!ELEMENT sprite_size EMPTY> 

<!ATTLISTsprite_size 

sprite_width CDATA #REQUIRED 
sprite_height CDATA #REQUIRED> 

<!ELEMENT sprite_num_pts EMPTY> 
<!ATTLIST sprite_num_pts 

num_pts CDATA #REQUIRED> 

<!ELEMENT sprite_coord EMPTY> 

<!ATTLIST sprite_coord 

xcoord CDATA #REQUIRED 
ycoord CDATA #REQUIRED> 

<!ELEMENT visspriteshape (shape)> 

<!ELEMENT vis_sprite_trajector>' (nwtion)> 

<! ELEMENT vis_spritc_warp EMPTY> 
<!ATTLIST vis_sprite_warp 

num_pts CDATA #REQUIRED> 

<! ELEMENT vis_sprite_brightness EMPTY> 

<!ATTLIST vis_sprite_brightness 

avgbright CDATA #REQUIRED 
varbright CDATA #REQU1RED> 

<?ELEMENT vis_sprite_te.\ture (tex ture)> 
transition.dtd: 



<!— Transition features — > 

<!ELEMENT transition (transition descl * )> 

<!ELEMENTtransition_dcscl (transition dcsc I value, code*)> 
<!ELEMENT transition_descl value (efTcct)> 
<»ELEMENT effect (#PCDATA)> 

<!— Transition features — > 
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camera motion.dtd: 



<!— Camera motion features — > 

<!ELEMENT camera_motion (background_affine_motion*)> 
<!~ Affine model for camera motion detection — > 

<!ELEMENT background_affine_motion (background_affine_motion_value, code*)> 

<! ELEMENT background_affine_motion_value (panning?, zoom?)> 

<!ELEMENT panning (direction?)> 
<!ATTLIST panning 

direction (NT|NE|ET|SE|ST|S W|WT|NW) #REQUIRED> 

<!ELEMENT zoom (direction?)> 
<!ATTLIST zoom 

direction (IN|OUT) #REQUIRED> 

<!ELEMENT direction (#PCDATA)> 

<!— Camera motion features end --> 



motion.dtd: 



<!— Motion features — > 
<!ELEMENT motion (affine_model*)> 
<!— Affine motion feature — > 

<!ELEMENT affinemodel (affine model value*, code*)> 

<!ELEMENT affine_model_value (parameters?, trajectory?)> 

<!ELEMENT parameters (affine_bin*)> 
<! ATTLIST parameters 

length CDATA #IMPLIED> 

<!ELEMENT affine J>in (#PCDATA)> 
<!ATTLISTaffine_bin 
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VIDEO DESCRIPTION SYSTEM AND METHOD 

SPECIFICATION 

CROSS REFERENCE TO RELATE D APPLICATIONS 
This application claims priority to United States provisional patent 
5 application Serial No. 60/1 18,020. filed February 1, 1999, United States provisional 
patent application serial no. 60/1 18,027, filed February 1, 1999 and United States 
provisional patent application serial no. 60/107,463, filed November 6, 1998. 

FIELD OF THE INVENTION 
The present invention relates to techniques for describing multimedia 
10 information, and more specifically, to techniques which describe video information 
and the content of such information. 

BACKGROUND OF THE INVENTION 
With the maturation of the global Internet and the widespread 
employment of regional networks and local networks, digital multimedia information 
15 has become increasingly accessible lo consumers and businesses. Accordingly, it has 
become progressively more important to develop systems that process, filter, search 
and organize digital multimedia information, so that useful information can be culled 
from this growing mass of raw information. 

At the time of filing the instant application, solutions exist that allow 
20 consumers and business to search for textual information. Indeed, numerous text- 
based search engines, such as those provided by yahoo.com, goto.com, excite.com 
and others are available on the World Wide Web, and are among the most visited Web 
sites, indicating the significant of the demand for such information retrieval 
technology. 

25 Unfortunately, the same is not true for multimedia content, as no 

generally recognized description of this material exists. In this regard, there have 
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been past attempts to provide multimedia databases which permit users to search for 
pictures using characteristics such as color, texture and shape information of video 
objects embedded in the picture. However, at the closing of the 20th Century, it is not 
yet possible to perform a general search the Internet or most regional or local 
networks for multimedia content, as no broadly recognized description of this material 
exists. Moreover, the need to search for multimedia content is not limited to 
databases, but extends to other applications, such as digital broadcast television and 
multimedia telephony. 

One industry wide attempt to develop a standard multimedia 
description framework has been through the Motion Pictures Expert Group's 
("MPEG") MPEG-7 standardization effort. Launched in October 1996, MPEG-7 
aims to standardize content descriptions of multimedia data in order to facilitate 
content-focused applications like multimedia searching, filtering, browsing and 
summarization. A more complete description of the objectives of the MPEG-7 
standard are contained in the International Organisation for Standardisation document 
ISO/IEC JTC1/SC29/WG1 1 N2460 (Oct. 1998), the content of which is incorporated 
by reference herein. 

The MPEG-7 standard has the objective of specifying a standard set of 
descriptors as well as structures (referred to as "description schemes") for the 
descriptors and their relationships to describe various types of multimedia 
information. MPEG-7 also proposes to standardize ways to define other descriptors 
as well as "description schemes" for the descriptors and their relationships. This 
description, i.e. the combination of descriptors and description schemes, shall be 
associated with the content itself, to allow fast and efficient searching and filtering for 
material of a user's interest. MPEG-7 also proposes to standardize a language to 
specify description schemes, i.e. a Description Definition Language ("DDL"), and the 
schemes for binary encoding the descriptions of multimedia content. 

At the time of filing the instant application, MPEG is soliciting 
proposals for techniques which will optimally implement the necessary description 
schemes for future integration into the MPEG-7 standard. In order to provide such 
optimized description schemes, three different multimedia-application arrangements 
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can be considered. These are the distributed processing scenario, the content- 
exchange scenario, and the format which permits the personalized viewing of 
multimedia content. 

Regarding distributed processing, a description scheme must provide 
the ability to interchange descriptions of multimedia material independently of any 
platform, any vendor, and any application, which will enable the distributed 
processing of multimedia content. The standardization of interoperable content 
descriptions will mean that data from a variety of sources can be plugged into a 
variety of distributed applications, such as multimedia processors, editors, retrieval 
systems, filtering agents, etc . Some of these applications may be provided by third 
parties, generating a sub-industry of providers of multimedia tools that can work with 
the standardized descriptions of the multimedia data. 

A user should be permitted to access various content providers 1 web 
sites to download content and associated indexing data, obtained by some low-level or 
high-level processing, and proceed to access several tool providers' web sites to 
download tools (e.g. Java applets) to manipulate the heterogeneous data descriptions 
in particular ways, according to the user's personal interests. An example of such a 
multimedia tool will be a video editor. A MPEG-7 compliant video editor will be able 
to manipulate and process video content from a variety of sources if the description 
associated with each video is MPEG-7 compliant. Each video may come with varying 
degrees of description detail, such as camera motion, scene cuts, annotations, and 
object segmentations. 

A second scenario that will greatly benefit from an interoperable 
content-description standard is the exchange of multimedia content among 
heterogeneous multimedia databases. MPEG-7 aims to provide the means to express, 
exchange, translate, and reuse existing descriptions of multimedia material. 

Currently, TV broadcasters. Radio broadcasters, and other content 
providers manage and store an enormous amount of multimedia material. This 
material is currently described manually using textual information and proprietary 
databases. Without an interoperable content description, content users need to invest 
manpower to translate manually the descriptions used by each broadcaster into their 
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own proprietary scheme. Interchange of multimedia content descriptions would be 
possible if all the content providers embraced the same content description schemes. 
This is one of the objectives of MPEG-7. 

Finally, multimedia players and viewers that employ the description 
schemes must provide the users with innovative capabilities such as multiple views of 
the data configured by the user. The user should be able to change the display's 
configuration without requiring the data to be downloaded again in a different format 
from the content broadcaster. 

The foregoing examples only hint at the possible uses for richly 
structured data delivered in a standardized way based on MPEG-7. Unfortunately, no 
prior art techniques available at present are able to generically satisfy the distributed 
processing, content-exchange, or personalized viewing scenarios. Specifically, the 
prior art fails to provide a technique for capturing content embedded in multimedia 
information based on either generic characteristics or semantic relationships, or to 
provide a technique for organizing such content. Accordingly, there exists a need in 
the art for efficient content description schemes for generic multimedia information. 

SUMMARY OF THE TNVFNTTON 
It is an object of the present invention to provide a description scheme for 
video content. 

It is a further object of the present invention to provide a description scheme 
for video content which is extensible. 

It is another object of the present invention to provide a description scheme for 
video content which is scalable. 

It is yet another object of the present invention to provide a description scheme 
for video content which satisfies the requirements of proposed media standards, such 
as MPEG-7. 

It is an object of the present invention to provide systems and methods for 
describing video content. 

It is a further object of the present invention to provide systems and methods 
for describing video content which are extensible. 
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It is another object of the present invention to provide systems and method for 
describing video content which are scalable. 

It is yet another object of the present invention to provide systems and 
methods for describing video content which satisfies the requirements of proposed 
5 media standards, such as MPEG-7 

In accordance with the present invention, a first method of describing video 
content in a computer database record includes the steps of establishing a plurality of 
objects in the video; characterizing the objects with a plurality of features of the 
objects; and relating the objects in a hierarchy in accordance with the features. The 
10 method can also include the further the step of relating the objects in accordance with 
at least one entity relation graph. 

Preferably, the objects can take the form of local objects (such as a group of 
pixels within a frame), segment objects (which represent one or more frames of a 
video clip) and global objects. The objects can be extracted from the video content 
1 5 automatically, semi-automatically , or manually. 

The features used to define the video objects can include visual features, 
semantic features, media features, and temporal features. A further step in the method 
can include assigning feature descriptors to further define the features. 

In accordance with another embodiment of the invention, computer readable 
20 media is programmed with at least one video description record describing video 

content. The video description record, which is preferably formed in accordance with 
the methods described above, generally includes a plurality of objects in the video; a 
plurality of features characterizing said objects; and a hierarchy relating at least a 
portion of the video objects in accordance with said features. 
25 Preferably, the description record for a video clip further includes at least one 

entity relation graph. It is also preferred that the features include at least one of visual 
features, semantic features, media features, and temporal features. Generally, the 
features in the description record can be further defined with at least on feature 
descriptor. 

30 A system for describing video content and generating a video description 

record in accordance with the present invention includes a processor, a video input 
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interface operably coupled to the processor for receiving the video content, a video 
display operatively coupled to the processor; and a computer accessible data storage 
system operatively coupled to the processor. The processor is programmed to 
generate a video description record of the video content for storage in the computer 
accessible data storage system by performing video object extraction processing, 
entity relation graph processing, and object hierarchy processing of the video content. 

In this exemplary system, video object extraction processing can include video 
object extraction processing operations and video object feature extraction processing 
operations. 

BRIEF DESCRIPTION OF THF PR AWTXTO 
Further objects, features and advantages of the invention will become 
apparent from the following detailed description taken in conjunction with the 
accompanying figures showing illustrative embodiments of the invention, in which 

Figure 1 A is an exemplary image for the image description system of 
the present invention. 

Figure IB is an exemplary object hierarchy for the image description 
system of the present invention. 

Figure 1C is an exemplary entity relation graph for the image 
description system of the present invention. 

Figure 2 is an exemplary block diagram of the image description 
system of the present invention. 

Figure 3A is an exemplary object hierarchy for the image description 
system of the present invention. 

Figure 3B is another exemplary object hierarchy for the image 
description system of the present invention. 

Figure 4A is a representation of an exemplary image for the image 
description system of the present invention. 

Figure 4B is an exemplary clustering hierarchy for the image 
description system of the present invention. 
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Figure 5 is an exemplary block diagram of the image description 
system of the present invention. 

Figure 6 is an exemplary process flow diagram for the image 
description system of the present invention. 

Figure 7 is an exemplary block diagram of the image description 
system of the present invention. 

Figure 8 is an another exemplary block diagram of the image 
description system of the present invention. 

Figure 9 is a schematic diagram of a video description scheme (DS), 
in accordance with the present invention. 

Figure 10 is a pictorial diagram of an exemplary video clip, with a 
plurality of objects defined therein. 

Figure 1 1 is a graphical representation of an exemplary semantic 
hierarchy illustrating exemplary relationships among objects in the video clip of 
Figure 10. 

Figure 12 is a graphical representation of an entity relation graph 
illustrating exemplary relationships among objects in the video clip of Figure 10. 

Figure 13 is a block diagram of a system for creating video content 
descriptions in accordance with the present invention. 

Figure 14 is a flow diagram illustrating the processing operations 
involved in creating video content description records in accordance with the present 



unless otherwise stated, are used to denote like features, elements, components or 
portions of the illustrated embodiments. Moreover, while the subject invention will 
now be described in detail with reference to the figures, it is done so in connection 
with the illustrative embodiments. It is intended that changes and modifications can 
be made to the described embodiments without departing from the true scope and 
spirit of the subject invention as defined by the appended claims. 



invention. 



Throughout the figures, the same reference numerals and characters, 
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DETAILED DESCRIPTION OF PRFFFRRFD FMBODTMFNTS 
The present invention constitutes a description scheme (DS) for 
images, wherein simple but powerful structures representing generic image data are 
utilized. Although the description scheme of the present invention can be used with 
any type of standard which describes image content, a preferred embodiment of the 
invention is used with the MPEG-7 standard. Although any Description Definition 
Language (DDL) may be used to implement the DS of the present invention, a 
preferred embodiment utilizes the extensible Markup Language (XML), which is a 
streamlined subset of SGML (Standard Generalized Markup Language, ISO 8879) 
developed specifically for World Wide Web applications. SGML allows documents 
to be self-describing, in the sense that they describe their own grammar by specifying 
the tag set used in the document and the structural relationships that those tags 
represent. XML retains the key SGML advantages in a language that is designed to 
be vastly easier to learn, use. and implement than full SGML. A complete 
description of XML can be found at the World Wide Web Consortium's web page on 
XML > at http://www.w3.cwp/XMI / the contents of which is incorporated by reference 
herein. 

The primary components of a characterization of an image using the 
description scheme of the present invention are objects, feature classifications, object 
hierarchies, entity-relation graphs, multiple levels of abstraction, code downloading, 
and modality transcoding, all of which will be described in additional detail below. In 
the description scheme of the present invention, an image document is represented by 
a set of objects and relationships among objects. Each object may have one or more 
associated features, which are generally grouped into the following categories: media 
features, visual features, and semantic features. Each feature can include descriptors 
that can facilitate code downloading by pointing to external extraction and similarity 
matching code. Relationships among objects can be described by object hierarchies 
and entity-relation graphs. Object hierarchies can also include the concept of multiple 
levels of abstraction. Modality transcoding allows user terminals having different 
capabilities (such as palmpilots. cellular telephones, or different types of personal 
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computers (PC's), for example) to receive the same image content in different 
resolutions and/or different modalities. 

As described above, a preferred embodiment of the image description 
system of the present invention is used with the MPEG-7 standard. In accord with 
this standard, this preferred embodiment uses objects as the fundamental entity in 
describing various levels of image content, which can be defined along different 
dimensions. For example, objects can be used to describe image regions or groups of 
image regions. High-level objects can in turn be used to describe groups of primitive 
objects 0 based on semantics or visual features. In addition, different types of features 
can be used in connection with different levels of objects. For instance, visual 
features can be applied to objects corresponding to physical components in the image 
content, whereas semantic features can be applied to any level of object. 

In addition, the image description system of the present invention 
provides flexibility, extensibility, scalability and convenience of use. In the interest of 
enhanced flexibility, the present invention allows portions of the image description 
system to be instantiated, uses efficient categorization of features and clustering of 
objects by way of an clustering hierarchy, and also supports efficient linking, 
embedding and downloading of external feature descriptors and execution code. The 
present invention also provides extensibility by permitting elements defined in the 
description scheme to be used to derive new elements for different domains. 
Scalability is provided by the present invention's capability to define multiple 
abstraction levels based on any arbitrary set of criteria using object hierarchies. These 
criteria can be specified in terms of visual features (size and color, for example), 
semantic relevance (relevance to user interest profile, for example) and/or service 
quality (media features, for example). The present invention is convenient to use 
because it specifies a minimal set of components: namely, objects, feature classes, 
object hierarchies, and entity-relation graphs. Additional objects and features can be 
added in a modular and flexible way. In addition, different types of object hierarchies 
and entity-relation graphs can each be defined in a similar fashion. 

Under the image description system of the present invention, an image 
is represented as a set of image objects, which are related to one another by object 
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hierarchies and entity-relation graphs. These objects can have multiple features which 
can be linked to external extraction and similarity matching code. These features are 
categorized into media, visual, and semantic features, for example. Image objects can 
be organized in multiple different object hierarchies. Non-hierarchical relationships 
among two or more objects can be described using one or more different entity- 
relation graphs. For objects contained in large images, multiple levels of abstraction 
in clustering and viewing such objects can be implemented using object hierarchies. 
These multiple levels of abstraction in clustering and viewing such images can be 
based on media, visual, and/or semantic features, for example. One example of a 
media feature includes modality transcoding, which permits users having different 
terminal specifications to access the same image content in satisfactory modalities and 
resolutions. 

The characteristics and operation of the image description system of 
the present invention will now be presented in additional detail. Figs. 1A, IB and 1C 
depict an exemplary description of an exemplary image in accordance with the image 
description system of the present invention. Fig. 1 A depicts an exemplary set of 
image objects and exemplary corresponding object features for those objects. More 
specifically, Fig. 1A depicts image object 1 (i.e., Ol) 2 ("Person A"), 02 6 ("Person 
B") and 03 4 ("People") contained in O0 8 (i.e., the overall exemplary photograph), 
as well as exemplary features 10 for the exemplary photograph depicted. Fig. IB 
depicts an exemplary spatial object hierarchy for the image objects depicted in Fig. 

1 A, wherein O0 8 (the overall photograph) is shown to contain Ol 2 ("Person A") and 

02 6 ("Person B"). Fig. 1C depicts an exemplary entity-relation (E-R) graph for the 
image objects depicted in Fig. 1 A, wherein 01 2 ("Person A") is characterized as 
being located to the left of, and shaking hands with, 02 6 ("Person B"). 

Fig. 2 depicts an exemplary graphical representation of the image 
description system of the present invention, utilizing the conventional Unified 
Modeling Language (UML) format and notation. Specifically, the diamond-shaped 
symbols depicted in Fig. 2 represent the composition relationship. The range 
associated with each element represents the frequency in that composition 
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relationship. Specifically, the nomenclature "0...*" denotes "greater than or equal to 
0;*' the nomenclature 4fc l ...*" denotes ''greater than or equal to 1 

In the following discussion, the text appearing between the characters 
"<" and ">** denotes the characterization of the referenced elements in the XML 
5 preferred embodiments which appear below. In the image description system of the 
present invention as depicted in Fig. 2, an image element 22 (<image>), which 
represents an image description, includes an image object set element 24 
(<image_object_set>), and may also include one or more object hierarchy elements 26 
(<object_hierarchy>) and one or more entity-relation graphs 28 

10 (<entity_relation_graph>). Each image object set element 24 includes one or more 
image object elements 30. Each image object element 30 may include one or more 
features, such as media feature elements 36, visual feature elements 38 and/or 
semantic feature elements 40. Each object hierarchy element 26 contains an object 
node element 32, each of which may in turn contain one or more additional object 

15 node elements 32. Each entity-relation graph 28 contains one or more entity relation 
elements 34. Each entity relation element 34 in turn contains a relation element 44, 
and may also contain one or more entity node elements 42. 

An object hierarchy element 26 is a special case of an entity-relation 
graph 28, wherein the entities are related by containment relationships. The preferred 

20 embodiment of the image description system of the present invention includes object 
hierarchy elements 26 in addition to entity relationship graphs 28, because an object 
hierarchy element 26 is a more efficient structure for retrieval than is an entity 
relationship graph 28. In addition, an object hierarchy element 26 is the most natural 
way of defining composite objects, and MPEG-4 objects are constructed using 

25 hierarchical structures. 

To maximize flexibility and generality, the image description system 
of the present invention separates the definition of the objects from the structures that 
describe relationships among the objects. Thus, the same object may appear in 
different object hierarchies 26 and entity-relation graphs 28. This avoids the 

30 undesirable duplication of features for objects that appear in more than one object 
hierarchy 26 and/or entity-relation graph 28. In addition, an object can be defined 
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without the need for it to be included in any relational structure, such as an object 
hierarchy 26 or entity-relation graph 28, so that the extraction of objects and relations 
among objects can be performed at different stages, thereby permitting distributed 
processing of the image content. 

Referring to Figs. 1 A, IB, 1C and Fig. 2, an image object 30 refers to 
one or more arbitrary regions of an image, and therefore can be either continuous or 
discontinuous in space. In Figs. 1 A, IB and 1C, Ol 2 ("Person A"), 02 6 ("Person 
B")> and O0 8 (i.e., the photograph) are objects with only one associated continuous 
region. On the other hand, 03 4 ("People") is an example of an object composed of 
multiple regions separated from one another in space. A global object contains 
features that are common to an entire image, whereas a local object contains only 
features of a particular section of that image. Thus, in Figs. 1 A, IB and 1C, O0 8 is a 
global object representing the entire image depicted, whereas Ol 2, 02 4 and 03 4 are 
each local objects representing a person or persons contained within the overall 
image. 

Various types of objects which can be used in connection with the 
present invention include visual objects, which are objects defined by visual features 
such as color or texture; media objects; semantic objects; and objects defined by a 
combination of semantic, visual, and media features. Thus, an object's type is 
determined by the features used to describe that object. As a result, new types of 
objects can be added as necessary. In addition, different types of objects may be 
derived from these generic objects by utilizing inheritance relationships, which are 
supported by the MPEG-7 standard. 

As depicted in Fig. 2, the set of all image object elements 30 
(<image_object>) described in an image is contained within the image object set 
element 24 (<image_object_set>). Each image object element 30 can have a unique 
identifier within an image description. The identifier and the object type (e.g., local 
or global) are expressed as attributes of the object element ID and type, respectively. 
An exemplary implementation of an exemplary set of objects to describe the image 
depicted in Figs. 1A, IB and 1C is shown below listed in XML. In all XML listings 
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shown below, the text appearing between the characters "<!- 4 * and >" denotes 
comments to the XML code; 

<image_object_set> 

<image_object id= M O0 M type="GLOBAL"> </image_object> <!- Photograph -> 
<image_object id=*or type="LOCAL**> </image_object> <!- Person A -> 
<image_object id=*02" type= M LOCAL M > </image_object> <!- Person B -> 
<image_object id="03" type=TOCAL"> </image_object> <!- People -> 

</image_object_set> 



As depicted in Fig. 2. image objects 30 may for example contain three 
feature class elements that group features together according to the information 
conveyed by those features. Examples of such feature class elements include media 
features 36 (<img_obj_media_features>), visual features 38 

(<img_obj_visual_features>), and semantic features 40 (<img_pbj_media_ features>). 
Table 1 below denotes an exemplary list of features for each of these feature classes. 

Table 1: Exemplary Feature Classes and Features. 

Feature Class Features 

Semantic Text Annotation, Who, What Object. What Action, Why, When, 

Where 

Visu al Color. Texture, Position. Size. Shape. Orientation 

Media File Format, File Size. Color Representation, Resolution, Data File 

Location. Modality Transcoding, Author, Date of Creation 

Each feature element contained in the feature classes in an image 
object element 30 will include descriptors in accordance with the MPEG-7 standard. 
Table 2 below denotes exemplary descriptors that may be associated with certain of 
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the exemplary visual features denoted in Table 1 . Specific descriptors such as those 
denoted in Table 2 may also contain links to external extraction and similarity 
matching code. Although Tables 1 and 2 denote exemplary features and descriptors, 
the image description system of the present invention may include, in an extensible 
and modular fashion, any number of features and descriptors for each object. 

Table 2: Exemplary Visual Features and Associated Descriptors. 
Descriptors 

Color Histogram, Dominant Color, Color Coherence Vector, Visual 
Sprite Color 

Tamura, MSAR, Edge Direction Histogram, DCT Coefficient 
Energies, Visual Sprite Texture 

Bounding Box, Binary Mask, Chroma Key, Polygon Shape, Fourier 
Shape, Boundary, Size, Symmetry, Orientation 

The XML example shown below denotes an example of how 
features and descriptors can be defined to be included in an image object 30. In 
particular, the below example defines the exemplary features 10 associated with the 
global object O0 depicted in Figs. 1A, IB and 1C, namely, two semantic features 
("where" and "when"), one media feature ("file format"), and one visual feature 
("color" with a "color histogram" descriptor). An object can be described by different 
concepts (<concept>) in each of the semantic categories as shown in the example 
below. 

<image_object id= M O0" type="GLOBAL"> <!- Global object: Photograph -> 
<img_obj_semantic_features> 
<where> 

<concept> Columbia University, NYC </concept> 



Feature 



Color 



Texture 



Shape 
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<concept> Outdoors </concept> 
</where> 

<when> <concept> 5/31/99 </concept> </when> 
</img_obj_semantic_features> 
5 <img_obj_media_features> 

<file_format> JPG </file_format> 
</img_obj_media_features> 
<img_obLvisual__features> 

<color> 

10 <color_histogram> 

<value format='*float[166]' , > .3 .03 .45 ... </value> 
</color_histogram> 

</color> 
</img_obj_visual_features> 
1 5 </image_global_object> 



As depicted in Fig. 2, in the image description system of the present 
invention the object hierarchy element 26 can be used to organize the image objects 
30 in the image object set 24, based on different criteria such as media features 36, 
visual features 38, semantic features 40, or any combinations thereof. Each object 

20 hierarchy element 26 constitutes a tree of object nodes 32 which reference image 
object elements 30 in the image object set 24 via link 33. 

An object hierarchy 26 involves a containment relation from one or 
more child nodes to a parent node. This containment relation may be of numerous 
different types, depending on the particular object features being utilized, such as 

25 media features 36, visual features 38 and/or semantic features 40, for example. For 
example, the spatial object hierarchy depicted in Fig. IB describes a visual 
containment, because it is created in connection with a visual feature, namely spatial 
position. Figs. 3A and 3B depict two additional exemplary object hierarchies. 
Specifically, Fig. 3A depicts an exemplary hierarchy for the image objects depicted in 

30 Fig. 1 A, based on the "who" semantic feature as denoted in Table 1 . Thus, in Fig. 3 A, 
03 4 ("People") is shown to contain Ol 2 ("Person A") and 02 6 ("Person B"). Fig. 
3B depicts an exemplary hierarchy based on exemplary color and shape visual 
features such as those denoted in Table 1 . In Fig. 3B, 07 46 could for example be 
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defined to be the corresponding region of an object satisfying certain specified color 
and shape constraints. Thus, Fig. 3B depicts 07 46 ("Skin Tone & Shape") as 
containing 04 48 ("Face Region 1 ") and 06 50 ("Face Region 2"). Object hierarchies 
26 combining different features can also be constructed to satisfy the requirements of 
a broad range of application systems. 

As further depicted in Fig. 2, each object hierarchy element 26 
(<object Jiierarchy>) contains a tree of object nodes (ONs) 32. The object hierarchies 
also may include optional string attribute types. If such string attribute types are 
present, a thesaurus can provide the values of these string attribute types so that 
applications can determine the types of hierarchies which exist. Every object node 32 
(<object_node>) references an image object 30 in the image object set 24 via link 33. 
Image objects 30 also can reference back to the object nodes 32 referencing them via 
link 33. This bi-directional linking mechanism permits efficient transversal from 
image objects 30 in the image object set 24 to the corresponding object nodes 32 in 
the object hierarchy 26, and vice versa. Each object node 32 references an image 
object 30 through an attribute (object_ref) by using a unique identifier of the image 
object. Each object node 32 may also contain a unique identifier in the form of an 
attribute. These unique identifiers for the object nodes 32 enable the objects 30 to 
reference back to the object nodes which reference them using another attribute 
(object_node_ref). An exemplary XML implementation of the exemplary spatial 
object hierarchy depicted in Fig. 1 B is expressed below. 

<object_hierarchy type=*SPATIAL'> <!- Object hierarchy: spatial hierarchy -> 
<object_node id="ON0" objecWef ="O0"> <!- Photograph -> 

<object_node id="ON1 M object_ref= H 01 n > </object_node> <!- Person A -> 
<object__node id="ON2 w 0bject_ref="O2 w > </object_node> <!- Person B -> 
</object_node> 
</object_hierarchy> 

Object hierarchies 26 can also be used to build clustering hierarchies 
and to generate multiple levels of abstraction. In describing relatively large images, 
such as satellite photograph images for example, a problem normally arises in 
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describing and retrieving, in an efficient and scalable manner, the many objects 
normally contained in such images. Clustering hierarchies can be used in connection 
with the image description system of the present invention to provide a solution to 
this problem. 

5 Figs. 4A and 4B depict an exemplary use of an clustering hierarchy 

scheme wherein objects are clustered hierarchically based on their respective size 
(<size>). In particular, Fig. 4A depicts a representation of a relatively large image, 
such as a satellite photograph image for example, wherein objects Oil 52, 012 54, 
013 56, 014 58 and 015 60 represent image objects of varying size, such as lakes on 

1 0 the earth's surface for example, contained in the large image. Fig. 4B represents an 

exemplary size-based clustering hierarchy for the objects depicted in Fig. 4A, wherein 
objects Ol 1 52, 012 54, 013 56, 014 58 and 015 60 represent the objects depicted in 
Fig. 4A, and wherein additional objects Ol 6 62, Ol 7 64 and Ol 8 56 represent objects 
which specify the size-based criteria for the cluster hierarchy depicted in Fig. 4B. In 

1 5 particular, objects 016 62, Ol 7 64 and Ol 8 56 may for example represent 

intermediate nodes 32 of an object hierarchy 26, which intermediate nodes are 
represented as image objects 30. These objects include the criteria, conditions and 
constraints related to the size feature used for grouping the objects together in the 
depicted cluster hierarchy. In the particular example depicted in Fig. 4B, objects 016 

20 62, Ol 7 64 and Ol 8 56 are used to form an clustering hierarchy having three 

hierarchal levels based on size. Object 016 62 represents the size criteria which 
forms the clustering hierarchy. Object Ol 7 64 represents a second level of size 
criteria of less than 50 units, wherein such units may represent pixels for example; 
object 01 8 56 represents a third level of size criteria of less than 10 units. Thus, as 

25 depicted in Fig. 4B, objects Ol 1 52, 012 54, 013 56, 014 58 and 015 60 are each 
characterized as having a specified size of a certain number of units. Similarly, 
objects 013 56, 014 58 and 015 60 are each characterized as having a specified size 
of less than 50 units, and object 015 60 is characterized as having a specified size of 
less than 10 units. 

30 Although Figs. 4A and 4B depict an example of a single clustering 

hierarchy based on only a single set of criteria, namely size, multiple clustering 
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hierarchies using different criteria involving multiple features may also be used for 
any image. For example, such clustering hierarchies may group together objects 
based on any combination of media, visual, and/or semantic features. This procedure 
is similar to the procedure used to cluster images together in visual information 
retrieval engines. Each object contained within the overall large image is assigned an 
image object 30 in the object set 24, and may also be assigned certain associated 
features such as media features 36, visual features 38 or semantic features 40. The 
intermediate nodes 32 of the object hierarchy 26 are represented as image objects 30, 
and also include the criteria, conditions and constraints related to one or more features 
used for grouping the objects together at that particular level. An image description 
may include any number of clustering hierarchies. The exemplary clustering 
hierarchy depicted in Figs. 4A and 4B is expressed in an exemplary XML 
implementation below. 

<image> 

<image_object_set> 

<image_object type="LOCAL" id="01 1 H > <!- Real objects of the image -> 

<size> <num_pixels> 120 </num_pixe1s> </size> 
</image_object> <i- Others objects — > 

<image_object type-"LOCAL" id="017"> <!- Intermediate nodes in the 

hierarchy-> 

<size> <num_pixels> <less_than> 50 </less_than> </num_pixels> 

</size> 

</image_object> < l - Others objects — > 
</image_object_set> 
<object_hierarchy> 

<object_node kJ="ON1 V object_ref="0 1 6"> 

<object_node kJ=*ON12" object_ref="011 H /> 
<object_node id="ON13" object_ref=:"012 M /> 
<object_node id^ON 1 4" object_ref="01 7"> 

<object_node id="ON15" object_ref= M 013" /> 
<object_node id="ON16" object_ref="014 N /> 
<object_node id="ON1 T object_ref= w 01 8 M > 

<object_node id="ON18" object_ref="015 n /> 
</object_node> 
</object_node> 
</object_node> 
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</object_hierarchy> 
</image> 



As depicted in the multiple clustering hierarchy example of Figs. 4 A 
and 4B, and as denoted in Table 3 below, there are defined three levels of abstraction 
5 based on the size of the objects depicted. This multi-level abstraction scheme 

provides a scalable method for retrieving and viewing objects in the image depicted in 
Fig. 4A. Such an approach can also be used to represent multiple abstraction levels 
based on other features, such as various semantic classes for example. 

Table 3: Objects in Each Abstraction Level 

10 Abstraction Level Objects 

1 011,012 

2 011,012,013,014 

3 011,012,013,014,015 



Although such hierarchical structures are suitable for purposes of 
retrieving images, certain relationships among objects cannot adequately be expressed 
using such structures. Thus, as depicted in Figs. 1C and 2, the image description 
system of the present invention also utilizes entity-relation (E-R) graphs 28 for the 
specification of more complex relationships among objects. An entity-relation graph 
28 is a graph of one or more entity nodes 42 and the relationships among them. Table 
4 below denotes several different exemplary types of such relationships, as well as 
specific examples of each. 
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Table 4: Examples of relation types and relations. 

Relation Type Relations 

Spatial 

Directional Top Of, Bottom Of, Right Of, Left Of, Upper Left Of, 
Upper Right Of, Lower Left Of, Lower Right Of 

Topological Adjacent To, Neighboring To, Nearby, Within, Contain 

Semantic Relative Of. Belongs To, Part Of, Related To, Same As, Is A, 

Consist Of 



Entity-relation graphs can be of any general structure, and can also be 
customized for any particular application by utilizing various inheritance 
relationships. The exemplary entity-relation graph depicted in Fig. 1C describes an 
exemplary spatial relationship, namely -Left Of", and an exemplary semantic 
relationship, namely "Shaking Hands With", between objects Ol 2 and 02 6 depicted 
in Fig. 1A. 

As depicted in Fig. 2. the image description system of the present 
invention allows for the specification of zero or more entity-relation graphs 28 
(<entity_relation_ graph>). An entity-relation graph 28 includes one or more sets of 
entity-relation elements 34 (<cntity_relation >). and also contains two optional 
attributes, namely a unique identifier ID and a string type to describe the binding 
expressed by the entity relation graph 28. Values for such types could for example be 
provided by a thesaurus. Each entity relation element 34 contains one relation 
element 44 (<relation>), and may also contain one or more entity node elements 42 
(<entity_node>) and one or more entity-relation elements 34. The relation element 44 
contains the specific relationship being described. Each entity node element 42 
references an image object 30 in the image object set 24 via link 43, by utilizing an 
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attribute, namely object_ref. Via link 43, image objects 30 also can reference back to 
the entity nodes 42 referencing the image objects 30 by utilizing an attribute 
(event_code_refs). 

As depicted in the exemplary entity-relation graph 28 of Fig. 1C, the 
5 entity-relation graph 28 contains two entity relations 34 between object Ol 2 ("Person 
A") and object 02 6 ("Person B")- The first such entity relation 34 describes the 
spatial relation 44 regarding how object Ol 2 is positioned with respect to (i.e., to the 
"Left Of) object 02 6. The second such entity relation 34 depicted in Fig. 1C 
describes the semantic relation 44 of how object Ol 2 is "Shaking Hand With" object 
10 02 6. An exemplary XML implementation of the entity-relation graph example 
depicted in Fig. 1 C is shown below: 



<entity_relation_graph> 

<entity_relation> <!- Spatial, directional entity relation — > 

<relation type= ,, SPATlAL.DIRECTIONAL , *> Left Of </relation> 
1 5 <emity_node id="ETN 1 " object_ref= w O 1 V> <entiry_node id— "ETN2" object_ref="02 t 7> 

</entity_relation> 

<entity_relation> <!- Semantic entity relation — > 

<relation type= H SEMANT!C"> Shaking hands with </relation> 

<emity_node id-"ETN3" object_ref="02"/> <entity_node id= w ETN4" v 

20 object_ref^"Or7> 

</entiry_rclation> 
</entity_relation_graph> 



For purposes of efficiency, entity-relation elements 34 may also 
include one or more other entity-relation elements 34, as depicted in Fig. 2. This 
25 allows the creation of efficient nested graphs of entity relationships, such as those 
utilized in the Synchronized Multimedia Integration Language (SMIL), which 
synchronizes different media documents by using a series of nested parallel sequential 
relationships. 

An object hierarchy 26 is a particular type of entity-relation graph 28 
30 and therefore can be implemented using an entity-relation graph 28, wherein entities 
are related by containment relationships. Containment relationships are topological 
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relationships such as those denoted in Table 4. To illustrate that an object hierarchy 
26 is a particular type of an entity-relation graph 28, the exemplary object hierarchy 
26 depicted in Fig. IB is expressed below in XML as an entity-relation graph 28. 

<entrty_relation_graph> 

<entity_relation> 

<relation type="SPATIAL"> Contain </retation> 
<entity_node object_ref="O07> <entity_node object_ref="01 7> 

</entity_relation> 

<entity_relation> 

<relation type="SPATIAL"> Contain </relation> 
<entity_node object^ref^OCT^ <entrty_node object_ref= w 02'7> 

</entity_relation> 
</entity_relation_graph> 

The exemplary hierarchy depicted in Fig. IB describes how object O0 
8 (the overall photograph) spatially contains objects Ol 2 ("Person A") and 02 6 
("Person B"). Thus, based on particular requirements, applications may implement 
hierarchies utilizing either the convenience of the comprehensive structure of an 
entity-relation graph 28, or alternatively by utilizing the efficiency of object 
hierarchies 26. 

For image descriptors associated with any type of features, such as 
media features 36, visual features 38 or semantic features 40 for example, the image 
description system of the present invention may also contain links to extraction and 
similarity matching code in order to facilitate code downloading, as illustrated in the 
XML example below. These links provide a mechanism for efficient searching and 
filtering of image content from different sources using proprietary descriptors. Each 
image descriptor in the image description system of the present invention may include 
a descriptor value and a code element, which contain information regarding the 
extraction and similarity matching code for that particular descriptor. The code 
elements (<code>) may also include pointers to the executable files (<location>), as 
well as the description of the input parameters (<inputjparameters>) and output 
parameters (<output_parameters>) for executing the code. Information about the type 
of code (namely, extraction code or similarity matching code), the code language 
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(such as Java or C for example), and the code version are defined as particular 
attributes of the code element. 

The exemplary XML implementation set forth below provides a 
description of a so-called Tamura texture feature, as set forth in H. Tamura, S. Mori, 
5 and T. Yamawaki, "Textual Features Corresponding to Visual Perception " IEEE 
Transactions on Systems, Man and Cybernetics, Vol. 8, No. 6, June 1978, the entire 
content of which is incorporated herein by reference. The Tamura texture feature 
provides the specific feature values (namely, coarseness, contrast, and directionality) 
and also links to external code for feature extraction and similarity matching. In the 

10 feature extraction example shown below, additional information about input and 
output parameters is also provided. Such a description could for example be 
generated by a search engine in response to a texture query from a meta search engine. 
The meta search engine could then use the code to extract the same feature descriptor 
from the results received from other search engines, in order to generate a 

15 homogeneous list of results for a user. In other cases, only the extraction and 

similarity matching code, but not the specific feature values, is included. If necessary 
in such instances, filtering agents may be used to extract feature values for processing. 

The exemplary XML implementation shown below also illustrates the 
way in which the XML language enables externally defined description schemes for 

20 descriptors to be imported and combined into the image description system of the 
present invention. In the below example, an external descriptor for the Croma Key 
shape feature is imported into the image description by using XML namespaces. 
Using this framework, new features, types of features, and image descriptors can be 
conveniently included in an extensible and modular way. 

25 <texture> <tamura> 

<tamura_value coarseness="0.01 " contra st="0. 39" directionality= M 0T7> 

<code type ^EXTRACTION" language="JAVA M version="1.r> <!- Link extraction 

code -> 

<location> <location_site href="ftp://extract.tamura.java7> </location> 
30 <input_parameters> <parameter name="image" type="PPM7> 

</input_parameters> 

<outpul_parameters> 
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<parameter name="tamura texture" type="doubie[3]7> 
</output_parameters> 

</code> 

<code type="DISTANCE" language="JAVA" version="4.2"> <!- Link similarity code - 

> 

<location> <location_site href= w ftp:/ydistance.tamurajava ,, /> </»ocation> 

</code> 
</tamura> </texture> 

<shape> <!- Import external shape descriptor DTD — > 

<chromaKeyShape xmlns:extShape w http://www.other.ds/chromaKeyShape.dtd* , > 
<extShape:HueRange> 

<extShape:start> 40 </extShape:start> <extShape:end> 40 

</extS nape: end > 

</extShape:HueRange> 
</chromaKeyShape> 
</shape> 

The image description system of the present invention also supports 
modality transcoding. In an exemplary instance in which a content broadcaster must 
transmit image content to a variety of users, the broadcaster must transcode the image 
content into different media modalities and resolutions, in order to accommodate the 
users' various terminal requirements and bandwidth limitations. The image 
description system of the present invention provides modality transcoding in 
connection with both local and global objects. This modality transcoding transcodes 
the media modality, resolution, and location of transcoded versions of the image 
objects in question, or alternatively links to external transcoding code. The image 
descriptor in question also can point to code for transcoding an image object into 
different modalities and resolutions, in order to satisfy the requirements of different 
user terminals. The exemplary XML implementation shown below illustrates 
providing an audio transcoded version for an image object. 

<image_object type= H GLOBAL M id="O0"> 
<img_obj_media_features> 

<location> <location_site href= M Hi.gif7> </locatioh> 
<modality_transcoding> 

<modality_object_set> 
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<modality_object id= w mo2 M type^AUDIO*' resolution= w 1 M > 
<location><location_site 

href= M Hi.au.xmr?o1 /></location> 

</modality_object> 
<modality_object_set> 
</modality_transcoding> 
<img_obj_media_features> 
</image_object> 



Fig. 5 depicts a block diagram of an exemplary computer system for 

10 implementing the image description system of the present invention. The computer 

system depicted includes a computer processor section 402 which receives digital data 
representing image content, via image input interface 404 for example. Alternatively, 
the digital image data can be transferred to the processor section 402 from a remote 
source via a bidirectional communications input/output (I/O) port 406. The image 

1 5 content can also be transferred to the processor section 402 from non-volatile 

computer media 408, such as any of the optical data storage or magnetic storage 
systems well known in the art. The processor section 402 provides data to an image 
display system 410, which generally includes appropriate interface circuitry and a 
high resolution monitor, such as a standard SVGA monitor and video card which are 

20 commonly employed in conventional personal computer systems and workstations for 
example. A user input device, such as a keyboard and digital pointing device a 
mouse, trackball, light pen or touch screen for example), is coupled to the processor 
section 402 to effect the user's interaction with the computer system. The exemplary 
computer system of Fig. 5 will also normally include volatile and non volatile 

25 computer memory 414. which can be accessed by the processor section 402 during 
processing operations. 

Fig. 6 depicts a flow chart diagram which further illustrates the 
processing operations undertaken by the computer system depicted in Fig. 5 for 
purposes of implementing the image description system of the present invention. 

30 Digital image data 3 1 0 is applied to the computer system via link 311. The computer 
system, under the control of suitable application software, performs image object 
extraction in block 320. in which image objects 30 and associated features, such as 
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media features 36, visual features 38 and semantic features 40 for example, are 
generated. Image object extraction 320 may take the form of a fully automatic 
processing operation, a semi-automatic processing operation, or a substantially 
manual operation in which objects are defined primarily through user interaction, such 
as via user input device 4 1 2 for example. 

In a preferred embodiment, image object extraction 320 consists of two 
subsidiary operations, namely image segmentation as depicted by block 325, and 
feature extraction and annotation as depicted by block 326. For the image 
segmentation 325 step, any region tracking technique which partitions digital images 
into regions that share one or more common characteristics may be employed. 
Likewise, for the feature extraction and annotation step 326, any technique which 
generates features from segmented regions may be employed. A region-based 
clustering and searching subsystem is suitable for automated image segmentation and 
feature extraction. An image object segmentation system is an example of a semi- 
automated image segmentation and feature extraction system. Manual segmentation 
and feature extraction could alternatively be employed. In an exemplary system, 
image segmentation 325 may for example generate image objects 30, and feature 
extraction and annotation 326 may for example generate the features associated with 
the image objects 30, such as media features 36, visual features 38 and semantic 
features 40, for example. 

The object extraction processing 320 generates an image object set 24, 
which contains one or more image objects 30. The image objects 30 of the image 
object set 24 may then be provided via links 321, 322 and 324 for further processing 
in the form of object hierarchy construction and extraction processing as depicted in 
block 330, and/or entity relation graph generation processing as depicted in block 336. 
Preferably, object hierarchy construction and extraction 330 and entity relation graph 
generation 336 take place in parallel and via link 327. Alternatively, image objects 30 
of the image object set 24 may be directed to bypass object hierarchy construction and 
extraction 330 and entity relation graph generation 336, via link 323. The object 
hierarchy construction and extraction 330 thus generates one or more object 
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hierarchies 26, and the entity relation graph generation 336 thus generates one or more 
entity relation graphs 28. 

The processor section 402 then merges the image object set 24, object 
hierarchies 26 and entity relation graphs 28 into an image description record for the 
5 image content in question. The image description record may then be stored directly 
in database storage 340, or alternatively may first be subjected to compression by 
binary encoder 360 via links 342 and 361, or to encoding by description definition 
language encoding (using XML for example) by XML encoder 350 via links 341 and 
351 . Once the image description records have been stored in data base storage 340, 
10 they remain available in a useful format for access and use by other applications 370, 
such as search, filter and archiving applications for example, via bidirectional link 
371. 

Referring to Fig. 7, an exemplary embodiment of a client-server 
computer system on which the image description system of the present invention can 

15 be implemented is provided. The architecture of the system 100 includes a client 

computer 110 and a server computer 120. The server computer 120 includes a display 
interface 130, a query dispatcher 140, a performance database 150, query translators 
160, 161, 165, target search engines 170, 171, 175, and multimedia content 
description systems 200, 201, 205, which will be described in further detail below. 

20 While the following disclosure will make reference to this exemplary 

client-server embodiment, those skilled in the art should understand that the particular 
system arrangement may be modified within the scope of the invention to include 
numerous well-known local or distributed architectures. For example, all 
functionality of the client-server system could be included within a single computer, 

25 or a plurality of server computers could be utilized with shared or separated 
functionality. 

Commercially available metasearch engines act as gateways linking 
users automatically and transparently to multiple text-based search engines. The 
system of Fig. 7 grows upon the architecture of such metasearch engines and is 
30 designed to intelligently select and interface with multiple on-line multimedia search 
engines by ranking their performance for different classes of user queries. 
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Accordingly, the query dispatcher 140, query translators 160, 161, 165, and display 
interface 130 of commercially available metasearch engines may be employed in the 
present invention. 

The dispatcher 140 selects the target search engines to be queried by 
consulting the performance database 1 50 upon receiving a user query. This database 
150 contains performance scores of past query successes and failures for each 
supported search option. The query dispatcher only selects search engines 1 70, 171, 
1 75 that are able to satisfy the user's query, e.g. a query seeking color information will 
trigger color enabled search engines. Search engines 170, 171, 175 may for example 
be arranged in a client-server relationship, such as search engine 170 and associated 
client 172. 

The query translators 160, 161, 165, translate the user query to suitable 
scripts conforming to the interfaces of the selected search engines. The display 
component 130 uses the performance scores to merge the results from each search 
engine, and presents them to the user. 

In accordance with the present invention, in order to permit a user to 
intelligently search the Internet or a regional or local network for visual content, 
search queries may be made either by descriptions of multimedia content generated by 
the present invention, or by example or sketch. Each search engine 170, 171, 175 
employs a description scheme, for example the description schemes described below, 
to describe the contents of multimedia information accessible by the search engine 
and to implement the search. 

In order to implement a content-based search query for multimedia 
information, the dispatcher 140 will match the query description, through the 
multimedia content description system 200, employed by each search engine 170, 
1 71 , 1 75 to ensure the satisfaction of the user preferences in the query. It will then 
select the target search engines 170, 171, 175 to be queried by consulting the 
performance database 150. If for example the user wants to search by color and one 
search engine does not support any color descriptors, it will not be useful to query that 
particular search engine. 
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Next, the query translators 160, 161, 165 will adapt the query 
description to descriptions conforming to each selected search engine. This translation 
will also be based on the description schemes available from each search engine. This 
task may require executing extraction code for standard descriptors or downloaded 
5 extraction code from specific search engines to transform descriptors. For example, if 
the user specifies the color feature of an object using a color coherence of 166 bins, 
the query translator will translate it to the specific color descriptors used by each 
search engine, e.g. color coherence and color histogram of x bins. 

Before displaying the results to the user, the query interface will merge 
1 0 the results from each search option by translating all the result descriptions into a 
homogeneous one for comparison and ranking. Again, similarity code for standard 
descriptors or downloaded similarity code from search engines may need to be 
executed. User preferences will determine how the results are displayed to the user. 

Referring next to Fig. 8, a description system 200 which, in accordance 
1 5 with the present invention, is employed by each search engine 1 70, 1 71 , 1 75 is now 
described. In the preferred embodiment disclosed herein, XML is used to describe 
multimedia content. 

The description system 200 advantageously includes several 
multimedia processing, analysis and annotation sub-systems 2 1 0, 220, 230, 240, 250, 
20 260, 270, 280 to generate a rich variety of descriptions for a collection of multimedia 
items 205. Each subsystem is described in turn. 

The first subsystem 210 is a region-based clustering and searching 
system which extracts visual features such as color, texture, motion, shape, and size 
for automatically segmented regions of a video sequence. The system 210 
25 decomposes video into separate shots by scene change detection, which may be either 
abrupt or transitional (e.g. dissolve, fade in/out, wipe). For each shot, the system 210 
estimates both global motion (i.e. the motion of dominant background) and camera 
motion, and then segments ; detects, and tracks regions across the frames in the shot 
computing different visual features for each region. For each shot, the description 
30 generated by this system is a set of regions with visual and motion features, and the 
camera motion. A complete description of the region-based clustering and searching 
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system 210 is contained in co-pending PCT Application Serial No. 
PCT/US98/09124, filed May 5, 1998, entitled "An Algorithm and System 
Architecture for Object-Oriented Content-Based Video Search," the contents of which 
are incorporated by reference herein. 

As used herein, a "video clip" shall refer to a sequence of frames of 
video information having one or more video objects having identifiable attributes, 
such as, by way of example and not of limitation, a baseball player swinging a bat, a 
surfboard moving across the ocean, or a horse running across a prairie. A "video 
object" is a contiguous set of pixels that is homogeneous in one or more features of 
interest, e.g., texture, color, motion or shape. Thus, a video object is formed by one 
or more video regions which exhibit consistency in at least one feature. For example 
a shot of a person (the person is the "object" here) walking would be segmented into a 
collection of adjoining regions differing in criteria such as shape, color and texture, 
but all the regions may exhibit consistency in their motion attribute. 

The second subsystem 220 is an MPEG domain face detection system, 
which efficiently and automatically detects faces directly in the MPEG compressed 
domain. The human face is an important subject in images and video. It is ubiquitous 
in news, documentaries, movies, etc., providing key information to the viewer for the 
understanding of the video content. This system provides a set of regions with face 
labels. A complete description of the system 220 is contained in PCT Application 
Serial No. PCT/US 97/20024, filed November 4, 1997, entitled "A Highly Efficient 
System for Automatic Face Region Detection in MPEG Video," the contents of which 
are incorporated by reference herein. 

The third subsystem 230 is a video object segmentation system in 
which automatic segmentation is integrated with user input to track semantic objects 
in video sequences. For general video sources, the system allows users to define an 
approximate object boundary by using a tracing interface. Given the approximate 
object boundary, the system automatically refines the boundary and tracks the 
movement of the object in subsequent frames of the video. The system is robust 
enough to handle many real-world situations that are difficult to model using existing 
approaches, including complex objects, fast and intermittent motion, complicated 
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backgrounds, multiple moving objects and partial occlusion. The description 
generated by this system is a set of semantic objects with the associated regions and 
features that can be manually annotated with text. A complete description of the 
system 230 is contained in U.S. Patent Application Serial No. 09/405,555, filed 
5 September 24, 1998, entitled "An Active System and Algorithm for Semantic Video 
Object Segmentation," the contents of which are incorporated by reference herein. 

The fourth subsystem 240 is a hierarchical video browsing system that 
parses compressed MPEG video streams to extract shot boundaries, moving objects, 
object features, and camera motion. It also generates a hierarchical shot-based 
10 browsing interface for intuitive visualization and editing of videos. A complete 

description of the system 240 is contained in PCT Application Serial No. PCT/US 
97/08266, filed May 16, 1997, entitled "Efficient Query and Indexing Methods for 
Joint Spatial/Feature Based Image Search," the contents of which is incorporated by 
reference herein. 

1 5 The fifth subsystem 250 is the entry of manual text annotations. It is 

often desirable to integrate visual features and textual features for scene classification. 
For images from on-line news sources, e.g. Clarinet, there is often textual information 
in the form of captions or articles associated with each image. This textual 
information can be included in the descriptions. 

20 The sixth subsystem 260 is a system for high-level semantic 

classification of images and video shots based on low-level visual features. The core 
of the system consists of various machine learning techniques such as rule induction, 
clustering and nearest neighbor classification. The system is being used to classify 
images and video scenes into high level semantic scene classes such as {nature 

25 landscape}, {city/suburb}, {indoor}, and {outdoor}. The system focuses on machine 
learning techniques because we have found that the fixed set of rules that might work 
well with one corpus may not work well with another corpus, even for the same set of 
semantic scene classes. Since the core of the system is based on machine learning 
techniques, the system can be adapted to achieve high performance for different 

30 corpora by training the system with examples from each corpus. The description 

generated by this system is a set of text annotations to indicate the scene class for each 
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image or each keyframe associated with the shots of a video sequence. A complete 
description of the system 260 is contained in S. Paek et al., "Integration of Visual and 
Text based Approaches for the Content Labeling and Classification of Photographs," 
ACM SIGIR/99 Workshop on Multimedia Indexing and Retrieval. Berkeley, C A 
(1999), the contents of which are incorporated by reference herein. 

The seventh subsystem 270 is model based image classification 
system. Many automatic image classification systems are based on a pre-defined set 
of classes in which class-specific algorithms are used to perform classification. The 
system 270 allows users to define their own classes and provide examples that are 
used to automatically learn visual models. The visual models are based on 
automatically segmented regions, their associated visual features, and their spatial 
relationships. For example, the user may build a visual model of a portrait in which 
one person wearing a blue suit is seated on a brown sofa, and a second person is 
standing to the right of the seated person. The system uses a combination of lazy- 
learning, decision trees and evolution programs during classification. The description 
generated by this system is a set of text annotations, i.e. the user defined classes, for 
each image. A complete description of the system 270 is contained in PCT 
Application Serial No. PCT/US 97/08266. filed May 16, 1997, entitled "A Method 
and Architecture for Indexing and Editing Compressed Video Over the World Wide 
Web," the contents of which are incorporated by reference herein. 

Other subsystems 280 may be added to the multimedia content 
description system 200. such as a subsystems from collaborators used to generate 
descriptions or parts of descriptions, for example. 

In operation, the image and video content 205 may be a database of 
still images or moving video, a buffer receiving content from a browser interface 206, 
or a receptacle for live image or video transmission. The subsystems 210, 220, 230, 
240, 250, 260, 270, 280 operate on the image and video content 205 to generate 
descriptions 21 1, 22L 231. 241. 251. 261. 271. 281 that include low level visual 
features of automatically segmented regions, user defined semantic objects, high level 
scene properties, classifications and associated textual information, as described 
above. Once all the descriptions for an image or video item are generated and 
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integrated in block 290, the descriptions are then input into a database 295, which the 
search engine 1 70 accesses. 

It should be noted that certain of the subsystems, i.e., the region-based 
clustering and searching subsystem 210 and the video object segmentation system 230 
5 may implement the entire description generation process, while the remaining 
subsystems implement only portions of the process and may be called on by the 
subsystems 210, 230 during processing. In a similar manner, the subsystems 210 and 
230 may be called on by each other for specific tasks in the process. 

In Figures 1-6, systems and methods for describing image content are 

10 described. These techniques are readily extensible to video content as well. The 

performance of systems for searching and processing video content information can 
benefit from the creation and adoption of a standard by which such video content can 
be thoroughly and efficiently described. As used herein, the term "video clip" refers 
to an arbitrary duration of video content, such as a sequence of frames of video 

1 5 information. The term description scheme refers to the data structure or organization # 
used to describe the video content. The term description record refers to the 

description scheme wherein the data fields of the data structure are defined by data ■« 
which describes the content of a particular video clip. . 

Referring to Figure 9, an exemplary embodiment of the present video 1 

20 description scheme (DS) is illustrated in schematic form. The video DS inherits all of 
the elements of the image description scheme and adds temporal elements thereto, 
which are particular to video content. Thus, a video element 922 which represents a 
video description, generally includes a video object set 924, an object hierarchy 
definition 926 and entity relation graphs 928, all of which are similar to those 

25 described in connection with Figure 2. An exemplary video DS definition is 
illustrated below in Table 5. 



Table 5: Elements in the Video Description Scheme (DS). 



Element 


Contains 


May be Contained in 


video 


video_object_set ( 1 ) 


(root element) 




object^hierarchy 1 (0..*) 
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entity_relation_graph' (0..*) 




video_object_set 


video object (1..*) 


video 


video_object 


vid_obj_media_features (0..1) 
vid_obj_semantic features 
(0..1) 

vid_obj_visual_features (0..1) 

vid_obj_temporal_features 

(0..1) 


object_set ! 


vid_obj_media_features 


location 1 (0..1) 
file_format 1 (0..1) 
file^size 1 (0..1) 
resolution 1 (0..1) 
modality_transcoding ! (0..1) 
bit rate (0..1) 


video_object 


vid_obj_semantic_featu 
res 


tex^annotation 1 (0..1) 

who, what_object, what_action, 

when, where, why (0..1) 


video_object 


vid_obj_visual_features 

& type="LOCAL" 


image^scl 1 (0..1) 
color 1 (0..1) 
texture 1 (0..1) 
shape 1 (0..1) 
size 1 (0..1) 
position 1 (0..1) 
motion (0..1) 


video_object 


vid_obj_visual_features 

1 

type^GLOBAL") 


video_scl (0..1) 

visual_sprite (0..1) 

transition (0..1) 

camera_motion (0. . 1 ) 

size(0..1) 

key frame (0..*) 


video_object 


vid_obj_visual_features 


time (0..1) 


video object 


object^hierarchy 1 


object_node' (1) 


video 


objec^node 1 


objec^node 1 (0..*) 


obj ect_hierarchy 1 
object node 1 
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entity^relation^raph 1 


entity_relation' (1..*) 


Video 


cntity_relation' 


relation 1 (1) 
entity_node ! (1..*) 
entity^relation 1 (0..*) 


Entity relation graph 1 
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1 Defined in the Image DTD7 



A basic element of the present video description scheme (DS ) is the 
video object (<video_object>) 930. A video object 930 refers to one or more arbitrary 
regions in one or more frames of a video clip. For example, and not by way of 
limitation, a video object may be defined as local objects, segment objects and global 
objects. Local objects refer to a group of pixels found in one or more frames . 
Segment objects refer to one or more related frames of the video clip. Global objects 
refer to the entire video clip. 

A video object 930 is an element of the video object set 924 and can 
be related to other objects in the object set 924 by the object hierarchy 926 and entity 
relation graphs 928 in the same manner as described in connection with Figs. 1-6. 
Again, the fundamental difference between the video description scheme and the 
previously described image description scheme resides in the inclusion of temporal 
parameters which will further define the video objects and their interrelation in the 
description scheme. 

In using XML to implement the present video description scheme, to 
indicate if a video object has associated semantic information, the video object can 
include a "semantic" attribute which can take on an indicative value, such as true or 
false. To indicate if the object has associated physical information (such as color, 
shape, time, motion and position), the object can include an optional "physical" 
attribute that can take on an indicative value, such as true or false. To indicate 
whether regions of an object are spatially adjacent to one another (continuous in 
space), the object can include an optional "spaceContinuous" attribute that can assume 
a value such as true or false. To indicate if the video frames which contain a 
particular object are temporally adjacent to one another (continuous in time), the 
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object can further include an optional "timeContinuous" attribute. This attribute can 
assume an indicative value, such as true or false. To distinguish if the object refers to 
a region within select frames of a video, to entire frames of the video, or to the entire 
video commonly (e.g. shots, scenes, stories), the object will generally include an 
5 attribute (type), that can have multiple indicative values such as, LOCAL, 
SEGMENT, and GLOBAL, respectively. 

Figure 1 0 is a pictorial diagram which depicts a video clip from a video 
clip wherein a number of exemplary objects are identified. Object O0, is a global 
object which refers to the entire video clip. Object Ol, the library, refers to an entire 

0 frame of video and would be classified as a segment type object. Objects 02 and 03 
are local objects which refer to narrator A and narrator B, respectively, which are 
person objects that are continuous in time and space. Object 04 ("Narrators") are 
local video objects (02, 03) which is discontinuous in space. Figure 10 further 
illustrates that objects can be nested. For example, object Ol, the library, includes 

5 local object 02, and both of these objects are contained within the global object O0. 
An XML description of the objects defined in Figure 1 0 is set forth below. 

<video_object_set> 

<vtdeo_object id="O0 M type= M GLOBAL"> <video_object> <!-Documentary -> 
<video_object id="01 M type="SEGMENT*> </video_object> <!-Library -> 
3 <video_object id= w 02" type="LOCAL"> </video_object> <!- Narrator A -> 

<video_object id= M 03 H type= M LOCAL"> </video_object> <!-Narrator B ~> 
<video_object id="Or type="LOCAL"> </video_object> <!-Narrators -> 
</image_object_set> 



Figure 1 1 illustrates how two or more video objects are related through 
the object hierarchy 926. In this case, objects 02 and 03 have a common semantic 
feature of "what object" being a narrator. Thus, these objects can be referenced in the 
definition of a new object, 04, narrators, via an object hierarchy definition. The 
details of such hierarchical definition follow that described in connection with Fig. 
3A, 
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Figure 1 2 illustrates how the entity relation graph in the video 
description scheme can relate video objects. In this case, two relationships are shown 
between objects 02 and 03. The first is a semantic relationship, "colleague of \ 
which is equivalent to the type of semantic relationship which could be present in the 
case of the image description scheme, as described in connection with Figure 1C. 
Figure 12 further shows a temporal relationship between the objects 02 and 03. In 
this case, object 02 precedes object 03 in time within the video clip, thus the 
temporal relationship "before" can be applied. In addition to the exemplary relation 
types and relations set forth in connection with the image description scheme, the 
video description scheme can employ the relation types and relations set forth in the 
table below. 



RELATION TYPE 


RELATIONS 


TEMPORAL- Directional 


Before, After, Immediately Before, Immediately After 


TEMPORAL-Topological 


Co-Begin. Co-End. Parallel, Sequential, Overlap, 
Within. Contain. Nearby 



The video objects 930 can be further characterized in terms of object 
features. Although any number and type of features can be defined to characterize the 
video objects in a modular and extensible manner, a useful exemplary feature set can 
include semantic features 940. visual features 938, media features 936 and temporal 
features 937. Each feature can then be further defined by feature parameters, or 
descriptors. Such descriptors will generally follow that described in connection with 
the image description scheme, with the addition of requisite temporal information. 
For example, visual features 938 can include a set of descriptors such as shape, color, 
texture, and position, as well as motion parameters. Temporal features 937 will 
generally include such descriptors as start time, end time and duration. Table 6 shows 
examples of descriptors, in addition to those set forth in connection with the image 
description scheme, that can belong to each of these exemplary classes of features. 
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Table 6: Feature classes and features. 



Feature Class 


Features 


Visual 


Motion. Editing effect, Camera Motion 


Temporal 


Start Time, End Time, Duration 



In summary, in a fashion analogous to the previously described image 
description scheme, the present video description scheme includes a video object set 
924, an object hierarchy 926, and entity relation graphs 928. Video objects 930 are 
further defined by features. The objects 930 within the object set 924 can be related 
hierarchically by one or more object hierarchy nodes 932 and references 933. 
Relations between objects 930 can also be expressed in entity relation graphs 928, 
which further include entity relations 934, entity nodes 942, references 943 and 
relations 944, all of which substantially correspond in the manner described in 
connection with Figure 2. Each video object 930 preferably includes features that can 
link to external extraction and similarity matching code. 

Figure 13 is a block diagram of an exemplary computer system for 
implementing the present video description systems and methods, which is analogous 
to the system described in connection with Figure 5. The system includes a computer 
processor section 1302 which receives digital data representing video content, such as 
via video input interface 1304. Alternatively, the digital video data can be transferred 
to the processor from a remote source via a bidirectional communications input/output 
port 1306. The video content can also be transferred to the processor section 1302 
from computer accessible media 408, such as optical data storage systems or magnetic 
storage systems which are known in the art. The processor section 1 302 provides 
data to a video display system 1310, which generally includes appropriate interface 
circuitry and a high resolution monitor, such as a standard SVGA monitor and video 
card commonly employed in conventional personal computer systems and 
workstations. A user input device 1312, such as a keyboard and digital pointing 
device, such as a mouse, trackball, light pen, touch screen and the like, is operatively 
coupled to the processor section 1302 to effect user interaction with the system. The 
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system will also generally include volatile and non volatile computer memory 1314 
which can be accessed by the processor section during processing operations. 

Figure 14 is a flow diagram which generally illustrates the processing 
operations undertaken by processor section 1302 in establishing the video DS 
described in connection with Figures 9-12. Digital data representing a video clip is 
applied to the system, such as via video input interface 1304 and is coupled to the 
processor section 1302. The processor section 1302, under the control of suitable 
software, performs video object extraction processing 1402 wherein video objects 
930, features 936, 937, 938, 940 and the associated descriptors are generated. Video 
object extraction processing 1402 can take the form of a fully automatic processing 
operation, a semi-automatic processing operation, or a substantially manual operation 
where objects are largely defined through user interaction via the user input device 
1312. 

The result of object extraction processing is the generation of an object 
set 924, which contains one or more video objects 930 and associated object features 
936, 937, 938, 940 . The video objects 930 of the object set 924 are subjected to 
further processing in the form of object hierarchy construction and extraction 
processing 1404 and entity relation graph generation processing 1406. Preferably, 
these processing operations take place in a parallel fashion. The output result from 
object hierarchy construction and extraction processing 1404 is an object hierarchy 
926. The output result of entity relation graph generation processing 506 is one or 
more entity relation graphs 928. The processor section 1302 combines the object set, 
object hierarchy and entity relation graphs into a description record in accordance 
with the present video description scheme for the applied video content. The 
description record can be stored in database storage 1410, subjected to low-level 
encoding 1412 (such as binary coding) or subjected to description language encoding 
(e.g. XML) 1414. Once the description records are stored in the read/write storage 
1308 in the form of a database, the data is available in a useful format for use by other 
applications 1416, such as search, filter, archiving applications and the like. 
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Exemplary Document Type Definition of Video D escription srhe m r 

This section discusses one embodiment wherein XML has been used to 
implement a document type definition (DTD) of the present video description scheme. 
Table 1, set forth above, summarizes the DTD of the present video DS. Appendix A 
includes the full listing of the DTD of the video DS. In general, a Document Type 
Definition (DTD) provides a list of the elements, tags, attributes, and entities 
contained in the document, and their relationships to each other. In other words, 
DTDs specify a set of rules for the structure of a document. DTDs may be included in 
a computer data file that contains the document they describe, or they may be linked 
to or from an external universal resource location (URL). Such external DTDs can be 
shared by different documents and Web sites. A DTD is generally included in a 
document's prolog after the XML declaration and before the actual document data 
begins. 

Every tag used in a valid XML document must be declared exactly once in the 
DTD with an element type declaration. The first element in a DTD is the root tag. In 
our video DS, the root tag can be designated as <video> tag. An element type 
declaration specifies the name of a tag, the allowed children of the tag, and whether 
the tag is empty. The root <video> tag can be defined, as follows: 

<!ELEMENT video (video_object_set, objectjiierarchy*, entity_relation_graph*)> 
where the asterisk (*) indicates zero or more occurrences. In XML syntax, the plus 
sign (+) indicates one ore more occurrences and the question mark (?) indicates zero 
or one occurrence. 

In XML, all element type declarations start with <! ELEMENT and end with >. 
They include the name of the tag being declared video and the allowed contents 
(video_object_set, objectjiierarchy*, entity_relation_graph*). This declaration 
indicates that a video element must contain a video object set element 
(<video_object_set>), zero or more object hierarchy elements (<object_hierarchy>), 
and zero or more entity relation graph elements (<entity_relation __graph>). 

The video object set 924 can be defined as follows. 



• 
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<!-- Video object set element — > 

<!— An object set consists of one or more video objects — > 
<! ELEMENT video_object_set (video_object+)> 
<!-- Video object element — > 
5 <!— Video object elements consist of the following elements: 

- optional vid_obj_media_features element 

- optional vid_obj_semantic_features element 

- optional vid_obj_visual_features element 

- optional vid_obj_temporal_features element — > 
1 0 <!ELEMENT video_object (vid_obj_media_features?, 

vid_obj_semantic_features?, 

vid_obj_visual_features?, 
vid_pbj_temporal_features?)> 

<!-- Video object elements must have a unique ID attribute in each description --> 
1 5 <! ATTLIST video__object 

type (LOCAL|SEGMENT|GLOBAL) #REQUIRED 
id ID #IMPLIED 
object_ref IDREF ^IMPLIED 
object_node_ref IDREFS ^IMPLIED 
20 entity_node_ref IDREFS #IMPLIED> 

<!— Feature elements: media, semantic, temporal and visual — > 
<!— Video object media features element consists an optional location, 
file format, file_size, resolution, modalitytranscoding, and bit rate elements 
— > 

25 <!ELEMENT vid_obj_media_features (location?, flle_format?, file_size?, 

resolution?, 

modalitytranscoding?, bit_rate?)> 
<!— Video object semantic features element consists an optional tcxt annotation and 
the 6-W elements --> 

30 <!ELEMENT vid_obj_semantic_features (text_annotation?, who?, what_action?, 

where?, why?, when?)> 
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<!— Video object visual features element consists image_scl, color, texture, shape, 
size, position, motion, video_scl, visual_sprite, transition, and camera_motion 
elements, and multiple key_frame elements— > 

<! ELEMENT vid_obj_visual_features (imagescl?, color?, texture?, shape?, size?, 
position?, motion?, video_scl?, visual_sprite?, transition?, camera_motion?, 
keyjrame*)> 



In the above example, the first declaration indicates that a video object 
set element (<video_object_set>) 924 contains one or more video objects 
(<video_object>) 930. The second declaration indicates that a video object 930 
contains an optional video object media feature (<vid_obj_media_features>)936, 
semantic feature (<vid_obj_ semantic_features>) 940, visual feature 
(<vid_obj_visual_features>) 938, and temporal feature 

(<vid_obj_temporal_features>) 937 elements. In addition, the video object tag is 
defined as having one required attribute, type, that can only have three possible values 
(LOCAL, SEGMENT, GLOBAL); and three optional attributes, id, objectref, and 
object_node_ref, of type ID, IDREFS. and IDREFS, respectively. 

Some XML tags include attributes. Attributes are intended for extra 
information associated with an element (like an ID). The last four declarations in the 
example shown above corresponds to the video object media feature 936, semantic 
feature 940, visual feature 938, and temporal feature 937 elements. These elements 
group feature elements depending on the information they provide. For example, the 
media features element (<vid_obj_media_features>) 936 contains an optional 
location, file_format, file_size, resolution, modal ity_transcoding, and bit_rate element 
to define the descriptors of the media features 936. The semantic feature element 
(<vid_obj_semantic_features>) contains an optional text annotation and the 6-W 
elements corresponding to the semantic feature descriptors 940. The visual feature 
element (<vid_obj_visua!_featurcs>) contains optional image_scl, color, texture, 
shape, size, position, vidco_scl. visual_sprite. transition, camera_motion elements, 
and multiple key_frame elements for the visual feature descriptors . The temporal 
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features element (<vid obj_temporal features>) contains an optional time element as 
the temporal feature descriptor. 

In the exemplary DTD listed in Appendix A, for the sake of clarity and 
flexibility feature elements are declared in external DTDs using entities. The 
following description sets forth a preferred method of referencing a separate external 
DTD for each one of these elements. 

In the simplest case, DTDs include all the tags used in a document. This 
technique becomes unwieldy with longer documents. Furthermore, it may be 
desirable to use different parts of a DTD in many different places. External DTDs 
enable large DTDs to be built from smaller ones. That is, one DTD may link to 
another and in so doing pull in the elements and entities declared in the first. Smaller 
DTD's are easier to analyze. DTDs are connected with external parameter references, 
as illustrated in the example following: 

<! ENTITY % camera_motion PUBLIC 

"http://www.ee.columbia.edu/mpeg7/xm l/features/camera_motion.dtd H > 
%camera_motion; 

The object hierarchy can be defined in the image DTD. The following 
example provides an overview of a declaration for the present object hierarchy 
element. 

<!-- Object hierarchy element — > 

<!— A hierarchy element consists of one root node — > 

<!ELEMENT object_hi erar chy (object_node)> 

<!— The object hierarchy element has two optional attributes: an id and a type — > 
<!ATTLIST object_hierarchy 

id ID ^IMPLIED 

type CDATA #IMPLIED> 
<!— Object node element --> 

<!-- Object node elements consist of zero or more object node elements — > 
<! ELEMENT object node (object_node*)> 

<!-- Object node elements must have an id attribute of type ID. — > 
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<! ATTLIST object_node 
id ID #IMPLIED 
object_ref IDREF #R£QUIRED> 

The object hierarchy element (<object_hierarchy>) preferably contains a 
single root object node element (<object_node>). An object node element generally 
contains zero or more object node elements. Each object node element can have an 
associated unique identifier, id. The identifier is expressed as an optional attribute of 
the elements of type ID, e.g. <object_node id="onP object_ref="ol ">. Each object 
node element can also include a reference to a video object element by using the 
unique identifier associated with each video object. The reference to the video object 
element is given as an attribute of type IDREF (object_ref). Object elements can link 
back to those object node elements pointing at them by using an attribute of type 
IDREFS (object_node_ref). 

The entity relation graph definition is very similar to the object hierarchy's 
one. An example, is listed below. 

<!-- Entity relation graph element --> 

<!-- A entity relation graph elemeni consists of zero or more entity relation elements 
— > 

<!ELEMENT entity_relation_graph (entity_relation+)> 

<!— A entity relation graph element can include two attributes: an id and a type — > 
<!-- Possible types of entity relation graphs and entity relations follow: 
• Spatial: topological, directional 

- Temporal: topological, directional 

- Semantic — > 

<!Al 1LIST entity_relation_graph 

id ID ^IMPLIED 

type CDATA #IMPLIED> 
<!— Entity relation element --> 

<!— A entity relation graph element consists of one relation, and zero or more entity 
nodes or entity relation elements — > 
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<!ELEMENT entity_relation (relation, (entity_node | entity_node_set | 
entity_relation)*)> 

<!— A entity relation element can include a type attribute — > 
<! ATTLIST entity_relation 
5 type CDATA #IMPLIED> 

<!— Entity relation element — > 
<!— Examples of relations are 

- SPATIAL.TOPOLOGICAL: overlap, etc. 

- SPAT1AL.DIRECTION AL: to the left, to the right, etc. 
1 0 - TEMPORALTOPOLOGIC AL: at the same time, etc. 

- TEMPORAL.DIRECTIONAL: before, after, immediately before, etc. 

- SEMANTIC: father of, etc. «> 
<! ELEMENT relation (//PCDATA | code)*> 

<!-- Entity node element — > 

<!-- This element can contain string data. It can have a unique attribute (id), and 
must include a reference attribute to an object element (object_ref) — > 
<!ELEMENT entity_node (#PCDATA)> 
<! ATTLIST entity node 
id ID #IMPLIED 

object_ref 1DREF #REQUIRED> 
<!— Entity node set element — > 
<! ELEMENT entity_node_set (entitynode+)> 

The declaration of the entity node element can contain either one or 
another element by separating the child elements with a vertical bar rather than a 
comma. 

The description above sets forth a data structure of a video description 
scheme, as well as systems and methods of characterizing video content in accordance 
with the present video description scheme. Of course, the present video description 
scheme can be used advantageously in connection with the systems described in 
30 connection with Figures 7 and 8. 
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Although the present invention has been described in connection 
with specific exemplary embodiments, it should be understood that various change: 
substitutions and alterations can be made to the disclosed embodiments without 
departing from the spirit and scope of the invention as set forth in the appended 
claims. 
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C L AIM S 

1 . A system for generating a description record from video 
information, comprising: 

at least one video input interface for receiving said video 

information; 

a computer processor coupled to said at least one video input 
interface for receiving said video information therefrom, processing said video 
information by performing video object extraction processing to generate video object 
descriptions from said video information, processing said generated video object 
descriptions by object hierarchy construction and extraction processing to generate 
video object hierarchy descriptions, and processing said generated video object 
descriptions by entity relation graph generation processing to generate entity relation 
graph descriptions, wherein at least one description record including said video object 
descriptions, said video object hierarchy descriptions and said entity relation graph 
descriptions is generated to represent content embedded within said video 
information; and 

a data storage system, operatively coupled to said processor, for 
storing said at least one description record. 

2. The system of claim 1 , wherein said video object extraction 
processing and said object hierarchy construction and extraction processing are 
performed in parallel. 



3. The system of claim 1, wherein said video object extraction 
processing comprises: 
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video segmentation processing to segment each video in said video 
information into regions within said video; and 

feature extraction and annotation processing to generate one or more 
feature descriptions for one or more said regions; 

whereby said generated video object descriptions comprise said one 
or more feature descriptions for one or more said regions. 



4. The system of claim 3, wherein said regions are selected 
from the group consisting of local, segment and global regions. 



5. The system of claim 3, wherein said one or more feature 
descriptions are selected from the group consisting of media features, visual features, 
temporal features, and semantic features. 



6. The system of claim 5, wherein said semantic features are 
further defined by at least one feature description selected from the group consisting 
of who, what object, what action, where, when, why, and text annotation. 



7. The system of claim 5, wherein said visual features are 
further defined by at least one feature description selected from the group consisting 
of color, texture, position, size, shape, motion, camera motion, editing effect, and 
orientation. 



8. The system of claim 5, wherein said media features are 
further defined by at least one feature description selected from the group consisting 
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of file format, file size, color representation, resolution, data file location, author, 
creation, scalable layer and modality transcoding. 

9. The system of claim 5, wherein said temporal features are 
further defined by at least one feature description selected from the group consisting 

5 of start time, end time and duration. 

10. The system of claim 1, wherein said object hierarchy 
construction and extraction processing generates video object hierarchy descriptions 
of said video object descriptions based on visual feature relationships of video objects 
represented by said video object descriptions. 

10 11. The system of claim 1 , wherein said object hierarchy 

construction and extraction processing generates video object hierarchy descriptions 
of said video object descriptions based on semantic feature relationships of video 
objects represented by said video object descriptions. 

12. The system of claim 1 . wherein said object hierarchy 

15 construction and extraction processing generates video object hierarchy descriptions 
of said video object descriptions based on media feature relationships of video objects 
represented by said video object descriptions. 

13. The system of claim 1 . wherein said object hierarchy 
construction and extraction processing generates video object hierarchy descriptions 

20 of said video object descriptions based on relationships of video objects represented 
by said video object descriptions, wherein said relationships are selected from the 
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group consisting of visual feature relationships, semantic feature relationships, 
temporal feature' relationships and media feature relationships. 



construction and extraction processing generates video object hierarchy descriptions 
of said video object descriptions based on relationships of video objects represented 
by said video object descriptions, wherein said video object hierarchy descriptions 
have a plurality of hierarchical levels. 

15. The system of claim 14, wherein said video object hierarchy 
descriptions having a plurality of hierarchical levels comprise clustering hierarchies, 

16. The system of claim 15, wherein said clustering hierarchies 
are based on relationships of video objects represented by said video object 
descriptions, wherein said relationships are selected from a group consisting of visual 
feature relationships, semantic feature relationships, temporal relationships and media 
feature relationships. 

17. The system of claim 15, wherein said video object hierarchy 
descriptions having a plurality of hierarchical levels are configured to comprise 
multiple levels of abstraction hierarchies. 



abstraction hierarchies are configured to be based on relationships of video objects 
represented by said video object descriptions, wherein said relationships are selected 
from a group consisting of visual feature relationships, semantic feature relationships, 
temporal feature relationships and media feature relationships. 



14. The system of claim 1, wherein said object hierarchy 



The method of claim 17, wherein said multiple levels of 
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19. The system of claim 1, wherein said entity relation graph 



generation processing generates entity relation graph descriptions of said video object 
descriptions based on relationships of video objects represented by said video object 
descriptions, wherein said relationships are selected from the group consisting of 
5 visual feature relationships, semantic feature relationships, temporal feature 
relationships and media feature relationships. 

20. The system of claim 1, further comprising an encoder for 
receiving and encoding said video object descriptions into encoded description 
information, wherein said data storage system is operative to store said encoded 
10 description information as said at least one description record. 



descriptions, said video object hierarchy descriptions, and said entity relation graph 
descriptions are combined together to form video descriptions, and further comprising 
an encoder for receiving and encoding said video descriptions into encoded 
1 5 description information, wherein said data storage system is operative to store said 
encoded description information as said at least one description record. 



21 . The system of claim 1, wherein said video object 



22. The system of claim 21 , wherein said encoder comprises a 



binary encoder. 



20 



23. The system of claim 21 . wherein said encoder comprises an 

XML encoder. 
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24. The system of claim 1 , further comprising: 

a video display device operatively coupled to the computer 
processor for displaying the video information; and 

at least one user input device operatively coupled to the computer 
processor, wherein at least a portion of said video object processing includes receiving 
a user input through manipulation of said user input device. 



25. A method for generating a description record from video 
information, comprising the steps of: 

receiving said video information; 

processing said video information by performing video object 
extraction processing to generate video object descriptions from said video 
information; 

processing said generated video object descriptions by object 
hierarchy construction and extraction processing to generate video object hierarchy 
descriptions; 

processing said generated video object descriptions by entity 
relation graph generation processing to generate entity relation graph descriptions, 
wherein at least one description record including said video object descriptions, said 
video object hierarchy descriptions and said entity relation graph descriptions is 
generated to represent content embedded within said video information; and 

storing said at least one description record. 



26. The method of claim 25, wherein said steps of video object 
extraction processing and object hierarchy construction and extraction processing are 
performed in parallel. 
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27. The method of claim 25, wherein said step of video object 
extraction processing comprises the further steps of; 

video segmentation processing to segment each video in said video 
information into regions within said video; and 

feature extraction and annotation processing to generate one or more 
feature descriptions for one or more said regions; 

whereby said generated video object descriptions comprise said one 
or more feature descriptions for one or more said regions. 



28. The method of claim 27, wherein said regions are selected 
from the group consisting of local, segment and global regions. 



29. The method of claim 27, further comprising the step of 
selecting said one or more feature descriptions from the group consisting of media 
features, visual features, temporal and semantic features. 



30. The method of claim 29, wherein said semantic features are 
further defined by at least one feature description selected from the group consisting 
of who, what object, what action, where, when, why and text annotation. 



3 1 . The method of claim 29, wherein said visual features are 
further defined by at least one feature description selected from the group consisting 
of color, texture, position, size, shape, motion, editing effect, camera motion and 
orientation. 
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32. The method of claim 29, wherein said media features are 
further defined by at least one feature description selected from the group consisting 
of file format, file size, color representation, resolution, data file location, author, 
creation, scalable layer and modality transcoding. 

33. The method of claim 29, wherein said temporal features are 
further defined by at least one feature description selected from the group consisting 
of start time, end time and duration. 

34. The method of claim 25, wherein said step of object 
hierarchy construction and extraction processing generates video object hierarchy 
descriptions of said video object descriptions based on visual feature relationships of 
video objects represented by said video object descriptions. 

35. The method of claim 25, wherein said step of object 
hierarchy construction and extraction processing generates video object hierarchy 
descriptions of said video object descriptions based on semantic feature relationships 
of video objects represented by said video object descriptions. 

36. The method of claim 25, wherein said step of object 
hierarchy construction and extraction processing generates video object hierarchy 
descriptions of said video object descriptions based on media feature relationships of 
video objects represented by said video object descriptions. 



37. The method of claim 25, wherein said step of object 
hierarchy construction and extraction processing generates video object hierarchy 
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descriptions of said video object descriptions based on temporal feature relationships 
of video objects represented by said video object descriptions. 

38. The method of claim 25, wherein said step of object 
hierarchy construction and extraction processing generates video object hierarchy 

5 descriptions of said video object descriptions based on relationships of video objects 
represented by said video object descriptions, wherein said relationships are selected 
from the group consisting of visual feature relationships, semantic feature 
relationships, temporal feature relationships and media feature relationships. 

39. The method of claim 25, wherein said step of object 

10 hierarchy construction and extraction processing generates video object hierarchy 

descriptions of said video object descriptions based on relationships of video objects 
represented by said video object descriptions, wherein said video object hierarchy 
descriptions are configured to have a plurality of hierarchical levels. 

40. The method of claim 39, wherein said video object hierarchy 
15 descriptions having a plurality of hierarchical levels are configured to comprise 

clustering hierarchies. 

41 . The method of claim 40, wherein said clustering hierarchies 
are configured to be based on relationships of video objects represented by said video 
object descriptions, wherein said relationships are selected from a group consisting of 

20 visual feature relationships, semantic feature relationships, temporal feature 
relationships and media feature relationships. 
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42. The method of claim 40, wherein said video object hierarchy 
descriptions having a plurality of hierarchical levels are configured to 

comprise multiple levels of abstraction hierarchies. 



43. The method of claim 40, wherein said multiple levels of 
abstraction hierarchies are configured to be based on relationships of video objects 
represented by said video object descriptions, wherein said relationships are selected 
from a group consisting of visual feature relationships, semantic feature relationships, 
temporal feature relationships and media feature relationships. 



44. The method of claim 25, wherein said step of entity relation 
graph generation processing generates entity relation graph descriptions of said video 
object descriptions based on relationships of video objects represented by said video 
object descriptions, wherein said relationships are selected from the group consisting 
of visual feature relationships, semantic feature relationships, temporal feature 
relationships and media feature relationships. 



45. The method of claim 25, further comprising the steps of 
receiving and encoding said video object descriptions into encoded description 
information, and storing said encoded description information as said at least one 
description record. 



46. The method of claim 25, further comprising the steps of 
combining said video object descriptions, said video object hierarchy descriptions, 
and said entity relation graph descriptions to form video descriptions, and receiving 
and encoding said video descriptions into encoded description information, and 
storing said encoded description information as said at least one description record. 



# 
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47. The method of claim 46, wherein said step of encoding 
comprises the step of binary encoding. 



48. The method of claim 46, wherein said step of encoding 
comprises the step of XML encoding. 



5 49. A computer readable media containing digital information 

with at least one description record representing video content embedded within 
corresponding video information, the at least one description record comprising: 

one or more video object descriptions generated from said video 
information using video object extraction processing; 

10 one or more video object hierarchy descriptions generated from said 

generated video object descriptions using object hierarchy construction and extraction 
processing; and 

one or more entity relation graph descriptions generated from said 
generated video object descriptions using entity relation graph generation processing. 



15 50. The computer readable media of claim 49, wherein said 

video object descriptions, said video object hierarchy descriptions, and said entity 
relation graph descriptions further comprise one or more feature descriptions. 



20 



5 1 . The computer readable media of claim 50 ? wherein said one 
or more feature descriptions are selected from the group consisting of media features, 
visual features, temporal features and semantic features. 
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52. The computer readable media of claim 51, wherein said 
semantic features are former defined by at least one feature description selected from 
the group consisting of who, what object, what action, where, when, why and text 
annotation. 



53. ThecomputerreadablemediaofclaimSl.whereinsaid 
v.sua, features are further defined by a, ieas, one feature description se.ec.ed from the 
roup consisting of color, texrure, po^n, size , ^ mo(jon ^ 
editing effect and orientation. 



54. Thecom P uterreadablemediaofclaim51,whereinsaid 
mCdia ^ « ^ defined * at -t one feature description selected from the 
group consisting of file format, file size, color representation, resolution, data file 
location, author, creation, scalable layer and modality transcoding 



55. The computer readable media of claim 51, wherein said 
temporal features are further defined by at least one feature description selected from 
the group consisting of start time, end time and duration. 



56. The computer readable media of claim 49, wherein said 
o^ect h.erarchy descriptions are based on visual feature relationships of video objects 
represented by said video object descriptions. 
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57. The computer readable media of claim 49. wherein said 
video object hierarchy descriptions are based on semantic feature relationships of 
video objects represented by said video object descriptions. 



58. The computer readable media of claim 49, wherein said 
video object hierarchy descriptions are based on media feature relationships of video 
objects represented by said video object descriptions. 



59. The computer readable media of claim 49, wherein said 
video object hierarchy descriptions are based on temporal feature relationships of 
video objects represented by said video object descriptions. 



60. The computer readable media of claim 49, wherein said 
video object hierarchy descriptions are based on relationships of video objects 
represented by said video object descriptions, wherein said video object hierarchy 
descriptions have a plurality of hierarchal levels. 



61 . The computer readable media of claim 60, wherein said 
video object hierarchy descriptions having a plurality of hierarchal levels comprise 
clustering hierarchies. 



62. The computer readable media of claim 61 , wherein said 
clustering hierarchies are based on relationships of video objects represented by said 
video object descriptions, wherein said relationships are selected from a group 
consisting of visual feature relationships, semantic feature relationships, temporal 
feature relationships and media feature relationships. 
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63. The computer readable media of claim 62, wherein said 
video object hierarchy descriptions having a plurality of hierarchical levels are 
configured to comprise multiple levels of abstraction hierarchies. 



64. The computer readable media of claim 63, wherein said 
multiple levels of abstraction hierarchies are configured to be based on relationships 
of video objects represented by said video object descriptions, wherein said 
relationships are selected from a group consisting of visual feature relationships, 
semantic feature relationships, temporal feature relationships and media feature 
relationships. 



65. The computer readable media of claim 49, wherein said 
entity relation graph descriptions are based on relationships of video objects 
represented by said video object descriptions, wherein said relationships are selected 
from the group consisting of visual feature relationships, semantic feature 
relationships, temporal feature relationships and media feature relationships. 



66. The computer readable media of claim 49, wherein said 
video object descriptions are in the form of encoded description information. 



67. The computer readable media of claim 49, wherein said 
video object descriptions, said video object hierarchy descriptions, and said entity 
relation graph descriptions are combined together in the form of encoded description 

information. 
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68. The computer readable media of claim 67, wherein said 
encoded description information is in the form of binary encoded information. 

69. The computer readable media of claim 67, wherein said 
encoded description information is in the form of XML encoded information. 

70. The system of claim 1, wherein feature descriptions include 
pointers to extraction and matching code to facilitate code downloading. 

71 . The system of claim 5, wherein feature descriptions include 
pointers to extraction and matching code to facilitate code downloading. 

72. The method of claim 25, wherein feature descriptions 
include pointers to extraction and matching code to facilitate code downloading. 

73. The method of claim 29, wherein feature descriptions 
include pointers to extraction and matching code to facilitate code downloading. 

74. The computer readable media of claim 49, wherein feature 
descriptions include pointers to extraction and matching code to facilitate code 
downloading. 



75. The computer readable media of claim 53, wherein feature 
descriptions include pointers to extraction and matching code to facilitate code 
downloading. 
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